# IAB303 - Assessment Task 2
## TOWS analysis report

#### INSTRUCTIONS

1. Complete the section below with your personal details (and run the cell)
2. Choose to use either the supplied scenario OR your own scenario. If selecting your own, check suitability with teaching team. If using the supplied scenario, use the provided internal data. You may supplement this with additional data as required.
3. Ensure that you include at least 1 complete analysis using *internal* data
4. Ensure that you include at least 1 complete analysis using *external* data
5. Ensure that you include at least 1 actionable recommendation from a TOWS analysis using your data analytics from steps 3 & 4.
6. Ensure that you use markdown cells to document your thinking and decision making for each stage of the process. Be clear on how your decisions are working towards addressing the business concern.
7. Ensure that you undertakee a peer review process and complete the peer review section
6. Before handing in your notebook, clear all cell outputs and run the complete notebook. Ensure that it runs without errors and that all output is displaying
7. Right-click on your notebook name (in file viewer) and select download. Ensure that your name and student ID are on the file, and then upload to the appropriate assignment upload link in blackboard.

In [None]:
# Complete the following cell with your details and run to produce your personalised header for this assignment

from IPython.core.display import display, HTML

first_name = "Cai"
last_name = "Liosatos"
student_number = "n10514295"

personal_header = "<h1>"+first_name+" "+last_name+" ("+student_number+")</h1>"
display(HTML(personal_header))

---

## SCENARIO

### Use the scenario below, OR write a description of your own scenario: 

You are working as a business analytics consultant for a non-profit organisation that offers residential aged care in Brisbane. The organisation has recently received philanthropic funding to help enance community understanding of the needs of the sector,  improve government action, and improve their influence within the sector.
Before taking action and spending the money, they would like to have a better understanding of strengths and weaknesses in their service area through an analysis of relevant service providers. They have provided you with a report on [Service Places from 2021](https://www.gen-agedcaredata.gov.au/Resources/Access-data/2022/April/GEN-data-Providers,-services-and-places-in-aged-ca) for this purpose. They would also like to know about possible opportunity and threats which may impact the objective. They have suggested using data on [what Australian's think of aged care](https://data.gov.au/dataset/ds-dga-2eae3889-8a5e-413a-9496-5fd80f7ae370/details?q=aged%20care), supplemented by an analysis of relevant Australian headlines from [the Guardian](https://www.theguardian.com/au) online news.



For the purposes of this exercise, you can make up other aspects of the scenario which may be important to your anlaysis (e.g. location, business details), and you may choose to use other sources of data if they are helpful.

---


### [1] Business Concern

*## Expand on your interpretation of your chosen scenario here, and be clear in identifying the business concern that the analysis will address. ##*


enhance community understanding of the needs of the sector,  <br />
improve government action, <br />
improve their influence within the sector<br />

strengths and weaknesses in their service area through an analysis of relevant service providers<br />
opportunity and threats which may impact the objective

external:
- covid
- death
- increasing prices
- accessibility
- staffing
- legislation/laws
- mandatory cultural/religious changes/assistance 

internal:
- number of diff org types per state
- remoteness
- operational places
- look at ACPR name, potentially look at total count/operational place for all of sydney, bris.....
- accessibility

In [None]:
# import libraries needed for this notebook here
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import requests
from bs4 import BeautifulSoup
import urllib.request
import json
import os

cache_on = True

### Services places:


- ACPR code - code for 'place'
- MMM code - 'remoteness' code for remoteness column, not 1:1
- service size, need to iterate to make it readable, (just 1-20, 21-40, 41-60, 61-80, 81-100, 101+)
- operational places - # of places

### survey:


weight: population weight expressed as thousands of people in the Australian population aged 18 years
or more (i.e. a value of 3 is 3,000 people).

SCR1 Do you agree to participate in the survey?<br />
SINGLE RESPONSE<br />
1. Yes
2. No

SCR3 Do you currently live in<br />
SINGLE RESPONSE<br />
READ OUT<br />
1. Residential aged care or a nursing home
2. A place you own
3. A place you rent
4. Somewhere else

HQGender: “allocated gender” for weighting from SCR4<br />
1. 1 Male
2. 2 Female

Hqareatypex: ARIA derived from SCR5<br />
1. Metro
2. Regional
3. Remote

age bracket (scr7)<br />
SCR7 To which of the following age groups do you belong?<br />
SINGLE RESPONSE<br />
READ OUT LIST<br />
1. Under 18
2. 18-24
3. 25-34
4. 35-44
5. 45-54
6. 55-64
7. 65-69
8. 70-79
9. 80-89
10. 90 or older
99. Prefer not to say

house income (scr9)<br />
SCR9 I will read a list of income ranges, please tell me which one is the best estimate of your TOTAL<br />
approximate annual income from all sources, before tax.<br />
SINGLE RESPONSE<br />
DO NOT READ PER WEEK UNLESS NEEDED<br />
$1 to $9,999 per year ($1 - $189 per week) **1**<br />
$10,000 - $19,999 per year ($190 - $379 per week) **2**<br />
$20,000 - $29,999 per year ($380 - $579 per week) **3**<br />
$30,000 - $39,999 per year ($580 - $769 per week) **4**<br />
$40,000 - $49,999 per year ($770 - $959 per week) **5**<br />
$50,000 - $59,999 per year ($960 - $1149 per week) **6**<br />
$60,000 - $79,999 per year ($1150 - $1529 per week) **7**<br />
$80,000 - $99,999 per year ($1530 - $1919 per week) **8**<br />
$100,000 - $124,999 per year ($1920 - $2399 per week) **9**<br />
$125,000 - $149,999 per year ($2400 - $2879 per week) **10**<br />
$150,000 - $199,999 per year ($2880 - $3839 per week) **11**<br />
$200,000 or more per year ($3840 or more per week) **12**<br />
DON’T KNOW **98**<br />
REFUSED **99**<br />

q36/40: if 36/40 = 1, then number in 36/40_other, elif 99, then user does not know/refuses and nothing in other


### [2] Analysis of External Data - Opportunities and Threats

*## Include a full QDAVI cycle for your analysis. You must do at least one complete analysis on external data. Ensure that you document what you are doing and why you are doing it ##*

### Question:

### Data:

#### functions to create required data

##### functions for caching API search results for local use

In [None]:
# function to create a cached file containing the page pulled from the API
def cache_save(folder_url, current_page, cache_list):
    np.save(folder_url+str(current_page)+".npy", cache_list, allow_pickle=True, fix_imports=True)

# function to load the cached files into a list
def cache_load(folder_url):
# loading cache
    if os.path.exists(folder_url):
        loaded_cache = []
        for file in os.listdir(folder_url):
            loaded_cache.append(np.load(folder_url+file, allow_pickle=True).tolist())
        new_cache_list = flatten_list(loaded_cache)
        return new_cache_list

# function to flatten list of cached lists into a singular list
def flatten_list(cache_list):
    flat_list = []
    # Iterate through the outer list
    for element in cache_list:
        if type(element) is list:
            # If the element is of type list, iterate through the sublist
            for item in element:
                flat_list.append(item)
        else:
            flat_list.append(element)
    return flat_list

##### functions for calling to TheGuardian API, and cleaning the DF

In [None]:
# calling the api URL to get the total page count
def page_count(api_url_start, api_url_end):
    error_counter = 0
    while error_counter < 3:
        api_url = api_url_start + "1" + api_url_end
        content = requests.get(api_url)
        api_data = json.loads(content.content)
        if api_data['response']['status'] == "ok":
            page_count = int(api_data['response']['pages'])
            break
        else:
            error_counter += 1
    page_count = "API has errored three times in a row whilst trying to get page count, check if the URL is correct" if error_counter >= 3 else page_count
    return page_count

#  function to scrape the data from the page returned from the api into a list
def api_scraping(page_count, url_start, url_end, folder_url):
    error_msg = []
    error_counter = 0
    current_page = 1
    results_list = []
# while loop to add outputs to a list
    while current_page <= page_count:
        if error_counter < 3:
            cache_list = []
            api_url = url_start + str(current_page) + url_end
            content = requests.get(api_url)
            api_data = json.loads(content.content)
            if api_data['response']['status'] == "error":
                error_msg.append(api_data['response']['message'])
                error_counter += 1
            else:
                for item in api_data['response']['results']:
                    cache_list.append(item)
                    results_list.append(item)
        
                # caching results for local use
                if cache_on:
                    cache_save(folder_url, current_page, cache_list)
                current_page += 1
                error_counter = 0
                error_msg = []
        else:
            error_msg.append("API has errored three times in a row, giving up")
            break
    return results_list, error_msg

# Clean up dataframe to be more visually appealing, and easier to use
def df_creation(dataframe_name):
    dataframe_name = dataframe_name.rename(columns = {'id':'ID', 'type':'Type', 'sectionId':'SectionID', 'sectionName':'Section Name', 'webPublicationDate':'Web Publication Date', 'webTitle':'Web Title', 'webUrl':'URL', 'apiUrl':'API URL', 'isHosted':'Is Hosted', 'pillarId': 'Pillar ID', 'pillarName':'Pillar Name'}).copy()
    # clean the data to be more user friendly
    if "Web Publication Date" in dataframe_name.columns:
        dataframe_name["Web Publication Date"] = dataframe_name["Web Publication Date"].apply(lambda x: x.replace("T", " ").replace("Z", ""))
        if dataframe_name["Web Publication Date"].dtype == object:
            dataframe_name["Web Publication Date"] = pd.to_datetime(dataframe_name["Web Publication Date"], format="%Y-%m-%d %H:%M:%S")
        dataframe_name.sort_values(by='Web Publication Date', ascending=False, inplace=True)
        dataframe_name["Month"] = dataframe_name["Web Publication Date"].dt.month
        dataframe_name["Year"] = dataframe_name["Web Publication Date"].dt.year
    if "Pillar ID" in dataframe_name.columns:
        dataframe_name["Pillar ID"] = dataframe_name["Pillar ID"].astype(str).apply(lambda x: x.replace("pillar/", ""))
    
    return dataframe_name

def duplicate_check(original_df):
    master_list = []
    result_list = []
    for index, row in original_df.iterrows():
        dynamic_map = {}
        dynamic_map["ID"] = row["ID"]
        dynamic_map["Type"] = row["Type"]
        dynamic_map["SectionID"] = row["SectionID"]
        dynamic_map["Section Name"] = row["Section Name"]
        dynamic_map["Web Publication Date"] = row["Web Publication Date"]
        dynamic_map["Web Title"] = row["Web Title"]
        dynamic_map["URL"] = row["URL"]
        dynamic_map["API URL"] = row["API URL"]
        dynamic_map["Is Hosted"] = row["Is Hosted"]
        dynamic_map["Pillar ID"] = row["Pillar ID"]
        dynamic_map["Pillar Name"] = row["Pillar Name"]
        dynamic_map["Month"] = row["Month"]
        dynamic_map["Year"] = row["Year"]
        result_list.append(dynamic_map)

    for item in range(len(result_list)):
        if result_list[item] not in master_list:
            master_list.append(result_list[item])  

    new_df = pd.DataFrame(master_list)
    return new_df

##### functions for HTML, and cleaning the constructed DF

In [None]:
def df_cleaning(df):
    if df["Date"].dtype == object:
        df["Date"] = pd.to_datetime(df["Date"], format="%Y-%m-%d %H:%M:%S")
    df.sort_values(by='Date', ascending=False, inplace=True)
    
    if "Month" not in df.columns:
        df["Month"] = df["Date"].dt.month
    if "Year" not in df.columns:
        df["Year"] = df["Date"].dt.year
    return df

# Get HTML function
def get_HTML(url):
    response = requests.get(url)
    html = response.text
    return html

# Beautiful soup function for subtitle
def extract_subTitle(HTML):
    soup = BeautifulSoup(HTML, "html.parser") # the html input and the parser name
    article = soup.find("main") # the tag that contains the article
    div_element = article.find("div", attrs={"data-gu-name": "standfirst"}) # the tag that can be found using an attribute
    target_element = div_element.find("p")
    if target_element:
        return target_element.text
    else:
        return '-'

# Beautiful soup function for live articles
def parse_article(article, temp_result):
    fig_element = article.find("figure")
    if fig_element:
        temp_result = ''
    else:
        for child in article.children:   
            if child.name == 'p':
                temp_result += child.text + '\n'
            if child.name == 'ul':
                for li in child.findAll('li'):
                    if li.find('ul'):
                        break
                    temp_result += li.text + '\n'
        temp_result += '\n'
    return temp_result

# Beautiful soup function for body
def extract_body(HTML):
    result = ""
    soup = BeautifulSoup(HTML, "html.parser") # the html input and the parser name

    news = soup.find("main", attrs={"data-layout": "LiveLayout"})
    if news:
        div_element = news.find("div", attrs={"id": "liveblog-body"}) # the tag that can be found using an attribute
        div_art_element = div_element.findAll("article")
        for item in div_art_element:
            temp_result = ''
            temp_result = parse_article(item, temp_result)
            result += temp_result
    else:
        news = soup.find("main") # the tag that contains the article
        div_element = news.find("div", attrs={"id": "maincontent"}) # the tag that can be found using an attribute
        div_div_element = div_element.find("div")
        target_elements = div_element.findAll("p")
        for te in target_elements:
            result += te.text + '\n'*2
            
    return result

def scraping_df(df):
    results_data_list = []

    for index, row in df.iterrows():
        dynamic_map = {}
        dynamic_map["Date"] = row["Web Publication Date"]
        dynamic_map["Section"] = row["Section Name"]
        dynamic_map["Title"] = row["Web Title"]
        dynamic_map["Subtitle"] = extract_subTitle(get_HTML(row["URL"]))
        dynamic_map["Body"] = extract_body(get_HTML(row["URL"]))

        results_data_list.append(dynamic_map)
        
    scraped_data_df = df_cleaning(pd.DataFrame(results_data_list))
    
    return scraped_data_df

#### code using said functions

##### pulling results from TheGuardian api from 2017 onwards under the search condition of "Aged care facility"

In [None]:
# setting important variables
aged_care_results = []
aged_care_api_url_start = "https://content.guardianapis.com/search?from-date=2017-01-01&order-by=newest&page="
aged_care_api_url_end = "&page-size=50&q=%22Aged%20care%20facility%22&api-key=dd3e21c9-be37-4bdb-b311-2fd86c0fd153"

# getting the total page count
aged_care_page_count = page_count(aged_care_api_url_start, aged_care_api_url_end)
# making the cache folder directory
if cache_on:
    if os.path.exists("./cache/Aged-Care-Facility/") is False:
        os.makedirs("./cache/Aged-Care-Facility/")

# printing first result of data if no errors, else printing the error (formatted)
if type(aged_care_page_count) == int:
    aged_care_results, error_msg = api_scraping(aged_care_page_count, aged_care_api_url_start, aged_care_api_url_end, "./cache/Aged-Care-Facility/")
    print(f"{error_msg[3]}\n\nThese were the error messages:\n1: {error_msg[0]}\n2: {error_msg[1]}\n3: {error_msg[2]}\n\nMake sure the URL is correct, then try again") if len(error_msg) > 3 else print(aged_care_results[0])
else:
    print(aged_care_page_count)



# 'https://content.guardianapis.com/search?from-date=2017-01-01&order-by=newest&page=1&page-size=50&q=%22Aged%20care%20facility%22&api-key=dd3e21c9-be37-4bdb-b311-2fd86c0fd153'

In [None]:
# loading cache
if cache_on:
    aged_care_results = cache_load("./cache/Aged-Care-Facility/")

In [None]:
ac_df = df_creation(pd.DataFrame(aged_care_results))
ac_df

##### pulling results from TheGuardian api from 2017 onwards under the search condition of "residential aged care"

In [None]:
#
# setting important variables
residential_care_results = []
residential_care_api_url_start = "https://content.guardianapis.com/search?from-date=2017-01-01&order-by=newest&page="
residential_care_api_url_end = "&page-size=50&q=%22Residential%20Aged%20Care%22&api-key=dd3e21c9-be37-4bdb-b311-2fd86c0fd153"

# getting the total page count
residential_care_page_count = page_count(residential_care_api_url_start, residential_care_api_url_end)

# making the cache folder directory
if cache_on:
    if os.path.exists("./cache/Aged-Care-Facility/") is False:
        os.makedirs("./cache/Residential-Aged-Care/")

# printing first result of data if no errors, else printing the error (formatted)
if type(residential_care_page_count) == int:
    residential_care_results, error_msg = api_scraping(residential_care_page_count, residential_care_api_url_start, residential_care_api_url_end, "./cache/Residential-Aged-Care/")
    print(f"{error_msg[3]}\n\nThese were the error messages:\n1: {error_msg[0]}\n2: {error_msg[1]}\n3: {error_msg[2]}\n\nMake sure the URL is correct, then try again") if len(error_msg) > 3 else print(residential_care_results[0])
else:
    print(residential_care_page_count)


# 'https://content.guardianapis.com/search?from-date=2017-01-01&order-by=newest&page=1&page-size=50&q=%22Residential%20Aged%20Care%22&api-key=dd3e21c9-be37-4bdb-b311-2fd86c0fd153'

In [None]:
# loading cache
if cache_on:
    residential_care_results = cache_load("./cache/Residential-Aged-Care/")

In [None]:
rc_df = df_creation(pd.DataFrame(residential_care_results))
rc_df

In [None]:
# combine the two dataframes together, removing duplicate entires
final_api_data = duplicate_check(pd.concat([ac_df, rc_df]))
final_api_data

##### using web scraping to retrieve certain aspects of the acquired theguardian articles

In [None]:
final_scrape_df = scraping_df(final_api_data)
final_scrape_df

##### pulling the data from the provided survey csv

In [None]:
# 
survey_df = pd.read_csv('for-release-community-attitudes-survey.csv', low_memory=False)
survey_df

### Analysis

#### functions for searching through the DF's for keywords, and removing duplicate entries

In [None]:
# searches the inputted original list of data scraped from the API for the inputted keywords, then creates a new dataframe
def keyword_search(new_df, keywords, check):
    results_list = []
    master_list = []

    for index, row in new_df.iterrows():
        # defining variables for later use
        null_count = 0
        keyword_map = {}
        item_Date = row["Date"]
        item_Title = row["Title"]
        item_Subtitle = row["Subtitle"]
        item_Body = row["Body"]
        item_Month = row["Month"]
        item_Year = row["Year"]

        # searching for the keywords inputted
        for keyword in keywords:
            keyword_map[keyword] = item_Body.lower().count(keyword.lower())
            if keyword_map[keyword] == 0:
                null_count += 1
        dynamic_object = {}

        # setting column variables in dynamic_object dict based on found keywords
        dynamic_object["Date"] = item_Date
        dynamic_object["Title"] = item_Title
        dynamic_object["Subtitle"] = item_Subtitle
        dynamic_object["Body"] = item_Body
        dynamic_object["Month"] = item_Month
        dynamic_object["Year"] = item_Year

        if check:
            for keyword in keywords:
                dynamic_object[keyword] = keyword_map[keyword]
        # appending found articles to list
        if null_count < len(keywords):
            results_list.append(dynamic_object)

    # removing duplicate entries from list (function can be called using multiple dataframes, and will work dynamically)
    for item in range(len(results_list)):
        if results_list[item] not in master_list:
            master_list.append(results_list[item])
            
    # creating dataframe with found results
    master_df = pd.DataFrame(master_list)
    master_df.sort_values(by='Date', ascending=False, inplace=True)

    return master_df


#### functions for creating histograms, and providing frequency count

In [None]:
# Function to create a simple histogram
def create_hist(df, col, title, xlabel, ylabel):
    bins = list(range(df[col].min(), df[col].max()+2))
    hist = df[col].hist(bins=bins)
    hist.set(title=title, xlabel=xlabel, ylabel=ylabel)
    return hist

# Function to create df containing the frequency of unique values in a given column in a given dataframe
def count_generator(df, col):
    list = []
    number = int(df[col].min())
    iterate_count = int(df[col].max()) - int(df[col].min()) + 1

    # iterate in the amount of the number of unique values
    for item in range(iterate_count):
        map = {}
        item_Month = number

        if number in df[col].values:
            item_Count = df[col].value_counts()[number]
        else:
            item_Count = 0

        # append values to object, then to list
        map[col] = item_Month
        map["Count"] = item_Count
        list.append(map)
        number +=1
    new_df = pd.DataFrame(list)
    return new_df

def multi_column_counter(df):
    list = []
    number = 1
    
    for item in range(12):
        map = {}
        item_Month = number
        
        if number in df["Month"].values:
            df2 = df[(df["Month"] == number)]
            item_death = df2['death'].sum()
            item_covid = df2['covid'].sum()
            item_corona = df2['corona'].sum()
            item_increase = df2['increase'].sum()
            item_staff = df2['staff'].sum()
            item_laws = df2['laws'].sum()
            item_legislat = df2['legislat'].sum()
        else:
            item_death = 0
            item_covid = 0
            item_corona = 0
            item_increase = 0
            item_staff = 0
            item_laws = 0
            item_legislat = 0

        # append values to object, then to list
        map["Month"] = number
        map["death"] = item_death
        map["covid"] = item_covid
        map["corona"] = item_corona
        map["increase"] = item_increase
        map["staff"] = item_staff
        map["laws"] = item_laws
        map["legislat"] = item_legislat
        list.append(map)
        number +=1
    return pd.DataFrame(list)


In [None]:
def survey_analysis(df, questions, questions_answers, gender):
    df_list = []
    age_map = {1: "Under 18", 2: "18-24", 3: "25-34", 4: "35-44", 5: "45-54", 6: "55-64", 7: "65-69", 8: "70-79", 9: "80-89", 10: "90 or older", 99: "Prefer not to say"}
    for question in questions:
        index = questions.index(question)        
        result_list = []
        for answer in list(range(questions_answers[index][0], questions_answers[index][1] + 1)):
            dynamic_map = {} 
            dynamic_map["Answers"] = answer
            for k, v in age_map.items():
                dynamic_map[v] = len(df.loc[(df[question] == answer) & (df["Age Bracket"] == k) & (df["HQGender"] == gender)])
            result_list.append(dynamic_map)
        df_list.append(pd.DataFrame(result_list))
        
    return df_list

In [None]:
# def survey_analysis(df, questions, questions_answers):
#     df_list = []
#     age_map = {1: "Under 18", 2: "18-24", 3: "25-34", 4: "35-44", 5: "45-54", 6: "55-64", 7: "65-69", 8: "70-79", 9: "80-89", 10: "90 or older", 99: "Prefer not to say"}
#     for question in questions:
#         index = questions.index(question)        
#         for gender in range(2):
#             result_list = [[]]
#             for answer in list(range(questions_answers[index][0], questions_answers[index][1] + 1)):
#                 dynamic_map = {} 
#                 dynamic_map["Answers"] = answer
#                 for k, v in age_map.items():
#                     dynamic_map[v] = len(df.loc[(df[question] == answer) & (df["Age Bracket"] == k) & (df["HQGender"] == gender + 1)])
#                 result_list[gender].append(dynamic_map)
#             df_list.append(pd.DataFrame(result_list))
        
#     return df_list

#### coding using said functions

##### checking scraped results for relevant keywords

In [None]:
# identify desired keywords (artciles relating the war in ukraine, and GEI/fuel issues)

# create DF of keywords (and count) in GRI issues data
keywords_df1 = keyword_search(final_scrape_df, ["death", "covid", "corona", "increase", "staff", "laws", "legislat"], True)

keywords_df1

In [None]:
# identify desired keywords (artciles relating the war in ukraine, and GEI/fuel issues)

# create DF of keywords (and count) in GRI issues data
keywords_df2 = keyword_search(final_scrape_df, ["death", "covid", "corona", "increase", "staff", "laws", "legisla"], False)

keywords_df2

##### creating new DFs only containing relevant information for analysis

In [None]:
year_keywords_list = []
year_keywords_count_list = []
for i in range(int(final_scrape_df["Year"].min()), int(final_scrape_df["Year"].max())+1):
    keywords_df = keywords_df1.loc[keywords_df1['Year'] == i].sort_values(by="Month", ascending=True)
    year_keywords_list.append(keywords_df)
    year_keywords_count_list.append(multi_column_counter(keywords_df))

In [None]:
year_df_list = []
year_count_list = []
for i in range(int(final_scrape_df["Year"].min()), int(final_scrape_df["Year"].max())+1):
    year_df = keywords_df1.loc[keywords_df1['Year'] == i].sort_values(by="Month", ascending=True)
    year_df_list.append(keywords_df)
    year_count_list.append(count_generator(year_df, "Month"))

##### Analysis of data from survey

In [None]:
questions_list = ["Q2_I", "Q3", "Q18_C", "Q18_D", "Q20_B", "Q26_C", "Q27", "Q32A_A", "Q32A_B", "Q32A_C", "Q32A_D", "Q32A_E", "Q32A_F", "Q32A_G", "Q32A_H", "Q32A_I", "Q32A_J", "Q32b_A", "Q32b_B", "Q32b_C", "Q32b_D", "Q32b_E", "Q32b_F", "Q32b_G", "Q32b_H", "Q32b_I", "Q32b_J", "Q36"]
questions_answers_list = [[1, 3], [1, 5], [1, 3], [1, 3], [1, 3], [1, 3], [1, 13], [1, 3], [1, 3], [1, 3], [1, 3], [1, 3], [1, 3], [1, 3], [1, 3], [1, 3], [1, 3], [1, 3], [1, 3], [1, 3], [1, 3], [1, 3], [1, 3], [1, 3], [1, 3], [1, 3], [1, 3], [1, 99]] 
survey_male_analysis_df_list = survey_analysis(survey_df, questions_list, questions_answers_list, 1)
survey_female_analysis_df_list = survey_analysis(survey_df, questions_list, questions_answers_list, 2)

survey_analysis_df_list = [survey_male_analysis_df_list, survey_female_analysis_df_list]

for m/f
s1
- scr3
- age )scrsomething)
- Q2_i
- Q3
--
s4?
Q18_C
Q18_D
Q20_B
--
s6
Q27
Q26_C

Q32_A
Q32_B

Q36


q5
q14
q24

### Visualisation:

In [None]:
# Visualise the politicians
fig, (ax0, ax1, ax2, ax3, ax4, ax5) = plt.subplots(nrows=6,  ncols=1)
fig.suptitle("Frequency of Article Publications relating\nto aged care services", fontweight="bold", size=15)
fig_list = [ax0, ax1, ax2, ax3, ax4, ax5]
year = final_scrape_df["Year"].min()

for count, df in enumerate(year_count_list):
    fig_list[count].set_title(f"Frequency of Article Publication by Month in {year}", fontweight="bold", size=13) 
    df.plot('Month', 'Count', ax=fig_list[count], label="Count", xticks=list(range(df["Month"].min(), df["Month"].max()+1)), xlabel="Month", ylabel="Frequency", figsize=(5,10))
    fig_list[count].legend(loc='upper right')
    year += 1
fig.tight_layout()


In [None]:
# Visualise the politicians
fig, (ax0, ax1, ax2, ax3, ax4, ax5) = plt.subplots(nrows=6,  ncols=1, figsize=(10,15))
fig.suptitle("Frequency of Article Publications per year with certain keywords", fontweight="bold", size=15)
fig_list = [ax0, ax1, ax2, ax3, ax4, ax5]
year = final_scrape_df["Year"].min()

for count, df in enumerate(year_keywords_count_list):
    fig_list[count].set_title(f"Frequency of Article Publication by Month in {year}", fontweight="bold", size=13) 
    df.plot("Month", ["death", "covid", "corona", "increase", "staff", "laws", "legislat"], ax=fig_list[count], kind="bar", xlabel="Month", ylabel="Frequency")
    fig_list[count].legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)
    year += 1
fig.tight_layout()

In [None]:
questions_list = ["Q2_I", "Q3", "Q18_C", "Q18_D", "Q20_B", "Q26_C", "Q27", "Q32A_A", "Q32A_B", "Q32A_C", "Q32A_D", "Q32A_E", "Q32A_F", "Q32A_G", "Q32A_H", "Q32A_I", "Q32A_J", "Q32b_A", "Q32b_B", "Q32b_C", "Q32b_D", "Q32b_E", "Q32b_F", "Q32b_G", "Q32b_H", "Q32b_I", "Q32b_J", "Q36"]

# Visualise the politicians
fig, ((ax0, ax1), (ax2, ax3), (ax4, ax5), (ax6, ax7), (ax8, ax9), (ax10, ax11), (ax12, ax13), (ax14, ax15), (ax16, ax17), (ax18, ax19), 
    (ax20, ax21), (ax22, ax23), (ax24, ax25), (ax26, ax27), (ax28, ax29), (ax30, ax31), (ax32, ax33), (ax34, ax35), (ax36, ax37), (ax38, ax39), 
    (ax40, ax41), (ax42, ax43), (ax44, ax45), (ax46, ax47), (ax48, ax49)) = plt.subplots(nrows=25, ncols=2, constrained_layout = True)
fig.suptitle("Survey Responses by Gender", fontweight="bold", size=20, y=18.15)
fig.subplots_adjust(left=0.125, bottom=0.1, right=0.9, top=18, wspace=None, hspace=None)

fig_list = [ax0, ax1, ax2, ax3, ax4, ax5, ax6, ax7, ax8, ax9, ax10, ax11, ax12, ax13, ax14, ax15, ax16, ax17, ax18, ax19, ax20, ax21, 
    ax22, ax23, ax24, ax25, ax26, ax27, ax28, ax29, ax30, ax31, ax32, ax33, ax34, ax35, ax36, ax37, ax38, ax39, ax40, ax41, ax42, ax43, ax44, 
    ax45, ax46, ax47, ax48, ax49]

fig_count = 0
if len(survey_analysis_df_list) > 0:
    for index in range(len(survey_analysis_df_list[0])):
        if len(survey_analysis_df_list[0][index]) != 3:
            continue
        
        fig_list[fig_count].set_title(f"Male responses to survey {questions_list[index]}", fontweight="bold", size=12)
        survey_analysis_df_list[0][index].plot("Answers", ["Under 18", "18-24", "25-34", "35-44", "45-54", "55-64", "65-69", "70-79", "80-89", "90 or older", "Prefer not to say"], ax=fig_list[fig_count], kind="bar", xlabel="Answer", ylabel="Frequency", figsize=(15,5))
        fig_list[fig_count].legend().remove()
        fig_count += 1

        fig_list[fig_count].set_title(f"Female responses to survey {questions_list[index]}", fontweight="bold", size=12)
        survey_analysis_df_list[1][index].plot("Answers", ["Under 18", "18-24", "25-34", "35-44", "45-54", "55-64", "65-69", "70-79", "80-89", "90 or older", "Prefer not to say"], ax=fig_list[fig_count], kind="bar", xlabel="Answer", ylabel="Frequency", figsize=(15,5))
        fig_list[fig_count].legend(ncol=1, labelspacing=0., bbox_to_anchor=(1.4, .7), borderaxespad=0.)
        fig_count += 1


### Insight:

### [3] Analysis of Internal Data - Strengths and Weaknesses

*## Include a full QDAVI cycle for your analysis. You must do at least one complete analysis on internal data. Ensure that you document what you are doing and why you are doing it  ##*

In [None]:
#
GEN_2021_data_df = pd.read_csv('ServicesPlaces_2020to2021_GENdata.csv')

GEN_2021_data_df

In [None]:
GEN_2020_data_df = pd.read_csv('ServicesPlaces_2019to2020_GENdata.csv', encoding = 'cp1252')
GEN_2020_data_df

In [None]:
GEN_2019_data_df = pd.read_csv('Services-and-places-in-aged-care-30-June-2019.csv', encoding = 'cp1252')
GEN_2019_data_df

### [4] TOWS analysis - actionable recommendations

*## Using your analytics from [1] and [2], perform a TOWS analysis to identify actionable recommendations. You must complete at least one quadrant of TOWS. Elaborate on your recommendation/s linking to the analysis, and also ensuring a meaningful connection to the business concern. ##*

---

### [5] Peer review

#### Feedback received - reviewer 1: Firstname Lastname (Student number)

*## Write comments here ##*

#### Feedback received - reviewer 2: Firstname Lastname (Student number)

*## Write comments here ##*

#### Feedback received - reviewer 3: Firstname Lastname (Student number)

*## Write comments here ##*

#### Response to feedback received:

*## Write response here ##*

#### Feedback given to reviewer 1:

*## Write comments here ##*

#### Feedback given to reviewer 2:

*## Write comments here ##*

#### Feedback given to reviewer 3:

*## Write comments here ##*