<a href="https://colab.research.google.com/github/Yuting-TinaL/PythonDA/blob/main/Data_Analytics_with_Python_Assessment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

This assessment aims to evaluate your understanding and application of the concepts covered in the Data Analytics course. You will be tasked with analyzing a dataset related to remote work and mental health, utilizing various data manipulation, statistical analysis, and visualization techniques learned throughout the course. This exercise will help reinforce your skills in data handling, exploratory analysis, and drawing meaningful insights from data.

### Submission Details:

The deadline for submission is 16 November at 11:59 PM. Specific submission details will be shared with you shortly.

### Passing Criteria:

To successfully pass this assessment, you must achieve a score of 80% or higher.
We encourage you to engage with the material and demonstrate your analytical skills. Good luck!


---



# Section 1 - Beginner (25%)


## Shopping Cart System with Discounts

Write a Python program to simulate a shopping cart system for an online store. The program will calculate the total cost of items, apply discounts, and check if the total exceeds a specified budget.

1.	Variables and Lists:
  - Define a `budget` variable with an initial value of 200.
  
  - Create two empty lists called `item_names_list` and `item_prices_list` to store the name and price of each item separately.

In [None]:
# Write your code here
budget = 200
item_names_list = []
item_prices_list = []

2. Functions:
  - Write a function `add_item_to_cart(item_name, item_price)` that takes the item’s name and price as arguments, appends the name to item_names and the price to item_prices, and returns both updated lists.
  
  - Write a function `calculate_total(item_prices)` that calculates and returns the total cost of all items in item_prices.

    Conditions:
    - If the total cost exceeds the budget after adding an item, print "Budget exceeded!" and stop adding more items.
    - If the total cost is within budget and exceeds $100, apply a 10% discount on the total and print the discounted total.

In [None]:
# Write your code here
def add_item_to_cart(item_name, item_price):
  item_names_list.append(item_name)
  item_prices_list.append(item_price)
  return item_names_list, item_prices_list

def calculate_total(item_prices):
  total = sum(item_prices)
  if total > budget:
    print("Budget exceeded!")
    return
  elif total > 100:
    print("Discounted total: ", total * 0.9)
  else:
    print("Discounted total: ", total)
  return total

3.	Loop and Input:
  - Start the input only once the user says 'start'
  - Use a loop to allow the user to add items to the cart by entering an item name and price. The loop should stop when the user types 'done'.
  - For each item, add it to item_names and item_prices using add_item_to_cart, then update the total cost using calculate_total.

Output:
  - After the loop ends, display the final cart with each item and its price, the initial total, any applicable discount, and the final total.


In [None]:
# Write your code here
def shopping_cart():
  start = input("Enter 'start' to start: ")
  if start != "start":
    print("Invalid input")
    return

  while True:
    item_name = input("Enter item name: ")
    if item_name == "done":
      break
    item_price = float(input("Enter item price: "))

    add_item_to_cart(item_name, item_price)

    print("Item added to cart")
    print("Current cart: ", item_names_list, item_prices_list)
    total = calculate_total(item_prices_list)
    if total is None:
      print("Exiting due to budget exceeded.")
      break
    print("Current total: ", total)

  print("Final cart: ", item_names_list, item_prices_list)
  final_total = calculate_total(item_prices_list)
  if final_total is not None:
    print("Final total: ", final_total)
  return item_names_list, item_prices_list, calculate_total(item_prices_list)

In [None]:
# Write your code here
shopping_cart()

# Section 2 - Intermidiate (55%) - Remote Work and Mental Health Analysis

Dataset source: Kaggle (https://www.kaggle.com/datasets/waqi786/remote-work-and-mental-health)




## Objective:
- In the following sections, you will explore the "Remote Work and Mental Health" dataset using Python and different data science libraries such as Pandas, NumPy and Matplotlib.
- Follow the instructions below to complete each task. Please provide code for each question and any observations as comments when necessary.

In [None]:
# Import necessary modules and libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## 1. Load Dataset (2 marks)
- Instructions: Load the dataset using Pandas and display few rows.
- Question: Describe the overall structure (rows, columns, data types) as a comment at the end of your code.


In [None]:
# Write code here
df = pd.read_csv('remote_work_mental_health.csv')
print(df.head())
print(df.shape)
print(df.dtypes)
print(df.info())
print(df.describe())

## 2. Display 'n' Rows (3 marks)
- Instructions: Display the first 13 rows of the dataset.

In [None]:
# Write code here
print(df.head(13))

- Instructions: Display the last 7 rows of the dataset

In [None]:
# Write code here
print(df.tail(7))

## 3. Find the Number of Null Values in the Dataset (2 mark)

In [None]:
# Write code here
print(df.isnull().sum())

## 4. Statistical Summary for Numeric Columns (10 marks)
Instructions: Use individual commands to find the statistical summary.

- Count

In [None]:
# Write code here
print(df.count())

- Mean

In [None]:
# Write code here
print(df.select_dtypes(include='number').mean())

- Standard Deviation

In [None]:
# Write code here
print(df.select_dtypes(include='number').std())

- Quartiles

In [None]:
# Write code here
print(df.select_dtypes(include='number').quantile([0.25, 0.5, 0.75]))

## 5. Calculate Extrema (2 marks)

In [None]:
# Write code here
print(df.select_dtypes(include='number').min())
print(df.select_dtypes(include='number').max())

## 6. Find Unique Values in a Categorical Column (3 marks)

- Instructions: Identify the unique values in the `job_role` column (2 marks)
- Question: How many unique roles are represented in the dataset? (1 mark)

In [None]:
# Write code here
print(df['Job_Role'].unique())
print(df['Job_Role'].nunique())

## 7. Group Data and Calculate Mean (4 marks)
- Instructions: Group the dataset by `job_role` and calculate the mean of the `Work_Life_Balance_Rating` for each role.
- Question: Which job role has the highest average Work life balance?

In [None]:
# Write code here
print(df.groupby('Job_Role'))
print(df.groupby('Job_Role')['Work_Life_Balance_Rating'])
print(df.groupby('Job_Role')['Work_Life_Balance_Rating'].mean())
print(df.groupby('Job_Role')['Work_Life_Balance_Rating'].mean().sort_values(ascending=False))

## 8. Filter Data Based on Condition (4 marks)
- Instructions: Filter the dataset to show only rows where `work_hours_per_week` is greater than 40.
- Question: How many employees are working overtime?

In [None]:
# Write code here
print(df[df['Hours_Worked_Per_Week'] > 40])
print(df[df['Hours_Worked_Per_Week'] > 40].shape)

## 9 . Histogram of Work Hours per Week (5 marks)
- Instructions: Create a histogram of `Hours_Worked_Per_Week` (4 marks).
- Question: Describe the distribution of work hours. Are most employees working around a certain number of hours per week? (1 mark)

In [None]:
# Write code here
plt.hist(df['Hours_Worked_Per_Week'])
plt.xlabel('Hours Worked Per Week')
plt.ylabel('Frequency')
plt.title('Histogram of Hours Worked Per Week')
plt.show()
print(df['Hours_Worked_Per_Week'].describe())

## 10. Scatter Plot of Work Hours vs. Years_of_Experience (4 marks)
- Instructions: Create a scatter plot with `Hours_Worked_Per_Week` on the x-axis and `Years_of_Experience` on the y-axis.

In [None]:
# Write code here
plt.scatter(df['Hours_Worked_Per_Week'], df['Years_of_Experience'])
plt.xlabel('Hours Worked Per Week')
plt.ylabel('Years of Experience')
plt.title('Scatter Plot of Hours Worked Per Week vs. Years of Experience')
plt.show()

## 11. Bar Chart of Average Work Life Balance by Job Role (5 marks)
- Instructions: Create a bar chart showing the average `Work_Life_Balance_Rating` for each `Job_Role` (4 marks).
- Question: Which job roles have the highest and lowest average mental Work Life Balance? (1 mark)

In [None]:
# Write code here
plt.bar(df.groupby('Job_Role')['Work_Life_Balance_Rating'].mean().index, df.groupby('Job_Role')['Work_Life_Balance_Rating'].mean())
plt.xlabel('Job Role')
plt.ylabel('Average Work Life Balance Rating')
plt.title('Bar Chart of Average Work Life Balance by Job Role')
plt.show()
print(df.groupby('Job_Role')['Work_Life_Balance_Rating'].mean().sort_values(ascending=False))

## 12. Pie Chart of Workload Level Distribution (5 marks)
- Instructions: Use a pie chart to show the proportion of `Access_to_Mental_Health_Resources` (Yes and no) in the dataset (4 marks).
- Question: What percentage of employees have access to mental health resources? (1 mark)

In [None]:
# Write code here
plt.pie(df['Access_to_Mental_Health_Resources'].value_counts(), labels=df['Access_to_Mental_Health_Resources'].value_counts().index, autopct='%1.1f%%')
plt.title('Pie Chart of Workload Level Distribution')
plt.show()
print(df['Access_to_Mental_Health_Resources'].value_counts())

## 13. Scatter Plot of Age vs. Social Isolation Rating (6 marks)
- Instructions: Create a scatter plot with `age` on the x-axis and `Social_Isolation_Rating` on the y-axis (4 marks).
- Question: Do you observe any trends or relationships between age and social isolation? Is there a noticeable impact of age on isoloation? (2 marks)

In [None]:
# Write code here
plt.scatter(df['Age'], df['Social_Isolation_Rating'])
plt.xlabel('Age')
plt.ylabel('Social Isolation Rating')
plt.title('Scatter Plot of Age vs. Social Isolation Rating')
plt.show()

# Section 3 - Long Answer/Advanced (20%)



## Job Role and Workload Level Impact on Mental Health

Instructions: Investigate the influence of job roles and workload level on the mental health.
- Create a new column `workload_level` that labels each entry as "High" if the `Hours_Worked_Per_Week` is above its mean, otherwise "Low." (5 marks)
- Group the dataset by `Industry` and calculate the average `Hours_Worked_Per_Week` for each combination. (5 marks)
- Use a bar chart to display the average `Stress_Level` for each job role, with separate bars for high and low stress levels. (5 marks)
- Analyze the results: Which job roles and workload levels appear to have the greatest impact on mental health? (5 marks)


In [None]:
# Write code here
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('remote_work_mental_health.csv')
df['workload_level'] = df['Hours_Worked_Per_Week'].apply(lambda x: 'High' if x > df['Hours_Worked_Per_Week'].mean() else 'Low')
df_grouped = df.groupby(['Job_Role', 'workload_level'])['Hours_Worked_Per_Week'].mean().reset_index()
print(df_grouped)
df_grouped = df.groupby(['Job_Role', 'workload_level'])['Hours_Worked_Per_Week'].mean().reset_index()

df_high = df_grouped[df_grouped['workload_level'] == 'High']
plt.figure(figsize=(10, 5))
plt.bar(df_high['Job_Role'], df_high['Hours_Worked_Per_Week'], color='red')
plt.xlabel('Job Role')
plt.ylabel('Average Hours Worked Per Week')
plt.title('Average Hours Worked Per Week for High Workload Level')
plt.show()

df_low = df_grouped[df_grouped['workload_level'] == 'Low']
plt.figure(figsize=(10, 5))
plt.bar(df_low['Job_Role'], df_low['Hours_Worked_Per_Week'], color='blue')
plt.xlabel('Job Role')
plt.ylabel('Average Hours Worked Per Week')
plt.title('Average Hours Worked Per Week for Low Workload Level')
plt.show()