# Housekeeper survey report

## Importing libraries and documents

In [2]:
import pandas as pd

import warnings
warnings.filterwarnings('ignore')

import plotly.express as px
import plotly.graph_objects as go

In [59]:
df = pd.read_csv("../Data/survey_2401.csv")

## Data Cleaning

In [60]:
df = df.drop(columns = ["Marca de temps"]) #Deleting the timestamp column as is not relevant for the analysis
df.columns = ["language", "age", "share_flat", "flatmates", "animals", "relationship", 
              "clean_satisfaction", "clean_outsource", "schedule_tool", "task_rotation", 
              "tasks", "shared_expenses","alone_all", "use_app", "feature_task", "feature_expenses", 
              "feature_reports", "feature_calendar", "feature_shoplist"] #Renaming the columns for easier manipulation

In [61]:
df_tasks = df['tasks'].str.get_dummies(sep=', ').add_prefix('task_') #Encoding tasks column
df = pd.concat([df, df_tasks], axis=1) # Joining the main df with the tables df
df = df.drop(columns = "task_Cleaning my room")
df = df.replace("Si", "Yes").replace("Friends, family", "Friends, Family")

In [62]:
df.head(3)

Unnamed: 0,language,age,share_flat,flatmates,animals,relationship,clean_satisfaction,clean_outsource,schedule_tool,task_rotation,...,feature_reports,feature_calendar,feature_shoplist,task_Buying common items,task_Cleaning bathroom/toilets,task_Cleaning kitchen,task_Cleaning living-room,task_Cleaning windows,task_Laundry,task_Throwing the garbage
0,English,25–30,Yes,3,Cats,Acquaintances,3.0,No,We just remember,Once a week,...,Very likely,Very likely,Very likely,1,1,1,1,1,1,1
1,English,18–24,Yes,2,No pets,"Friends, Family",4.0,No,White board on the fridge door,Once a week,...,Very likely,Likely,Very likely,1,1,1,1,0,0,1
2,English,25–30,Yes,3,No pets,"Friends, Family",3.0,Yes,,,...,,,,0,0,0,0,0,0,0


## Dataset description

After cleaning and encoding the dataset, it contains a total of 26 columns:
- `language`: indicates the language of the survey
- `age`: indicates the age group of the respondant
- `share_flat`: funnel question I, filters for people only sharing flat
- `flatmates`: indicates the total number of people living in the house or flat
- `animals`: indicates if the respondants live with animals or not, and what kind of animals live in the flat
- `relationship`: indicates the kind of relationship that the respondant has with his/her flatmates
- `clean_satisfaction`: indicates the cleanliness satisfaction of the flat (ranged from 1 being the worst to 5 being the best)
- `clean_outsource`: funnel question II, indicates if the cleaning tasks are outsourced in the respondant home
- `schedule_tool`: indicates how are the house tasks organized or schedule
- `task_rotation`: indicates how often are tasks rotated in the respondant's flat
- `shared_expenses`: indicates how shared expenses are approached in the respondant's house
- `alone_all`: indicates if the respondant prefer cleaning alone or alltogether with his/her flatmates
- `use_app`: funnel question III, indicates if the respondant would use an app to organise the household tasks
- `features`: indicates if the respondant would use the different features in a scale from 1 to 5, being 1 Very Unlikely and 5 Very likely.
- `tasks`: encoded columns that indicates what kind of tasks do the respondant perform at home

## The Funnel approach

Perhaps the most important part of the survey process is the creation of questions that accurately measure the opinions, experiences and behaviors of the public. Accurate random sampling and high response rates will be wasted if the information gathered is built on a shaky foundation of ambiguous or biased questions. Creating good measures involves both writing good questions and organizing them to form the questionnaire.

In this case, the survey has been created with a funnel approach. This technique involves starting with general questions, and then drilling down to a more specific point in each. Usually, this will involve asking for more and more detail at each level.

In [63]:
def funnel(df, col):
    
    """ Function that takes a column and calculates the count of Yes/No answers """
    
    df = pd.DataFrame(df.groupby(col)[col].agg("count"))
    return df

In [64]:
#Applying the function to the columns
share_funnel = funnel(df, "share_flat")
outsource_funnel = funnel(df, "clean_outsource")
app_funnel = funnel(df, "use_app")

In [65]:
#Plotting the results
stages = ["Share flat", "Don't outsource cleaning", "Willing to use the app", "Potential customers"]
yes_numbers = [share_funnel.iat[1,0], outsource_funnel.iat[0,0], app_funnel.iat[1,0], app_funnel.iat[1,0]]
df_yes = pd.DataFrame(dict(number=yes_numbers, stage=stages))
df_yes['Answer'] = 'Yes'
no_numbers = [share_funnel.iat[0,0], outsource_funnel.iat[1,0], app_funnel.iat[0,0], 0]
df_no = pd.DataFrame(dict(number=no_numbers, stage=stages))
df_no['Answer'] = 'No'
df_funnel = pd.concat([df_yes, df_no], axis=0)
fig = px.funnel(df_funnel, x='number', y='stage', color='Answer', width=900, height=400)
fig.update_layout(
    title="<b>Questions funnel</b>",
    font=dict(
        family="Courier New, monospace",
        size=12
    ))
fig.show()

The graph show the different funnel stages of the survey as well as the amount of responses filtered by the different stages of the funnel. The total amount of answers is 230, considering the 4 languages of the survey. The first funnel (sharing a flat or not) filtered a 66% of the total responses. The second funnel (outsourcing the cleaning tasks or not) filtered a 38% of the total responses. Finally, the last funnel, which correspond to the amount of people willing to use an app to organize the household tasks filtered a 17% of the total responses.

 

## STAGE 0: Sharing or not sharing

In [66]:
def plot_variable(df, col, agg):
    
    """Function that takes a df, a column and an aggregate function and plots a variable """
    
    df = pd.DataFrame(df.groupby(col)[col].agg(agg))
    df = df * 100 / df.sum(axis=0)
    df = df.unstack().reset_index()
    df = df.sort_values(by=col, ascending=True)
    fig = px.bar(df, x=col, y=0)
    return fig

The graph below shows if the overall respondants of the survey are living in a shared flat or not, distributed by ages groups. The relationship between the variables age and sharing flat show a clear negative correlation.
A negative, or inverse correlation, indicates that one variable increases while the other decreases, and vice-versa. This relationship may or may not represent causation between the two variables, but it does describe an observable pattern in the market, which is that <b>younger age's groups are more likely to live in shared flats.</b>

In [67]:
share_flat = pd.DataFrame(df.groupby(["age", "share_flat"])["share_flat"].agg("count")).dropna()
unstack = share_flat["share_flat"].unstack()
unstack["total"] = unstack.sum(axis=1)
unstack["No"] = unstack["No"] * 100 / unstack["total"]
unstack["Yes"] = unstack["Yes"] * 100 / unstack["total"]

# Plotting the results
labels = list(unstack.index)
fig = go.Figure(data=[
        go.Bar(name='Yes', x=labels, y=unstack["Yes"]),
        go.Bar(name='No', x=labels, y=unstack["No"])])
    
# Change the bar mode
fig.update_layout(barmode='group', width=600, height=400,
    title="<b>Sharing or not sharing per ages</b>",
    xaxis_title="Number of flatmates",
    yaxis_title="Percentage",
    font=dict(
        family="Courier New, monospace",
        size=12))

## STAGE 1: Flat-sharing individuals

### Number of flatmates per age group

The main <b>average of flatmates between the individuals that are sharing their houses is 3. </b>
Per ages group, we can se that all almost all the ages group and number of flatmates follow a exponential distribution again. Regarding the youngest age group (18-24), it can be confirmed that they live with more amount of people. 

In [68]:
flatmates_ages = pd.DataFrame(df.groupby(["flatmates", "age"])["flatmates"].agg(total="count"))
flatmates_ages = flatmates_ages / flatmates_ages.groupby(level=1).sum() * 100 #Calculating percentages over age groupds
unstack = flatmates_ages["total"].unstack()
unstack.loc["total"] = unstack.sum()
unstack.loc["percentage"] = 0
labels = list(unstack.columns)
fig = go.Figure(data=[
        go.Bar(name='2', x=labels, y=unstack.iloc[0,:]),
        go.Bar(name='3', x=labels, y=unstack.iloc[1,:]),
        go.Bar(name='4', x=labels, y=unstack.iloc[2,:]),
        go.Bar(name='5', x=labels, y=unstack.iloc[3,:]),
        go.Bar(name='More than 5', x=labels, y=unstack.iloc[4,:])])
    # Change the bar mode

fig.update_layout(barmode='group', width=700, height=400,
    title="<b>Percentage of number of flatmates</b>",
    xaxis_title="Number of flatmates",
    yaxis_title="Percentage",
    font=dict(
        family="Courier New, monospace",
        size=12
    ))

### Animals

Regarding the animals, it is clearly shown that most of the individuals don't have pets in their shared flats. Regarding the kind of pet, is it equally distributed between cats and dogs.

In [69]:
animals = plot_variable(df, "animals", "count")

animals.update_layout(
    title="<b>Animals in the shared spaces</b>",
    xaxis_title="Animals",
    yaxis_title="Percentage", width=500, height=350,
    font=dict(
        family="Courier New, monospace",
        size=12
    ))

### Relationship with flatmates

Regarding the kind of flatmates that respondants have, most of them (71%) live with friends or family. Both Acquaintances and People met on an app stand ot for a 15% of the total individuals. People living with parners represent a 5% of the individuals sharing their flat. 

Per ages groups, it can be observed in the graph below that younger individuals tend to share more their spaces with people they met on apps. 

In [70]:
relationship_ages = pd.DataFrame(df.groupby(["relationship", "age"])["relationship"].agg(total="count"))
relationship_ages = relationship_ages / relationship_ages.groupby(level=1).sum() * 100 #Calculating percentages over age groups
unstack = relationship_ages["total"].unstack()
unstack.loc["total"] = unstack.sum()
unstack.loc["percentage"] = 0
labels = list(unstack.columns)
fig = go.Figure(data=[
        go.Bar(name='Acquaintances', x=labels, y=unstack.iloc[0,:]),
        go.Bar(name='Friends, Family', x=labels, y=unstack.iloc[1,:]),
        go.Bar(name='Partner', x=labels, y=unstack.iloc[3,:]),
        go.Bar(name='People I met on app', x=labels, y=unstack.iloc[4,:]),
        go.Bar(name='Others', x=labels, y=unstack.iloc[2,:])])
    # Change the bar mode

fig.update_layout(barmode='group', width=800, height=400,
    title="<b>Percentage of number of flatmates</b>",
    xaxis_title="Number of flatmates",
    yaxis_title="Percentage",
    font=dict(
        family="Courier New, monospace",
        size=12
    ))

### Cleanliness satisfaction

Regarding the variable cleanliness satisfaction, it can be confirmed that most of the people is satisfied with their house cleanliness. Only the 6% of the individuals sharing flat show a clear poor satisfaction with their space cleanliness.

In [71]:
cleanliness = plot_variable(df, "clean_satisfaction", "mean")
cleanliness.update_layout(
    title="<b>Average cleanliness satisfaction</b>",
    xaxis_title="Rate 1-5",
    yaxis_title="Percentage", width=500, height=350,
    font=dict(
        family="Courier New, monospace",
        size=12
    ))

### Outsourcing cleaning tasks

The second funnel question asked individuals that are sharing flat if they outsource the household or cleaning tasks. The chart below shows how almost a 40% of the individuals outsource the tasks.

In [72]:
outsource = pd.DataFrame(df.groupby("clean_outsource")["clean_outsource"].agg("count"))
outsource = outsource * 100 / outsource.sum(axis=0)
outsource = outsource.unstack().reset_index()
fig = px.pie(outsource, values=0, names="clean_outsource")
fig.update_layout(
    title="<b>Percentage of individuals outsourcing cleaning tasks</b>", width=500, height=350,
    font=dict(
        family="Courier New, monospace",
        size=12
    ))

## STAGE 2: Sharing flat and not outsourcing the cleaning tasks

### Schedule tool

Regarding the scheduling tools for organizing the household tasks, we can se that a big part of the individuals (33%) do not actually schedule their tasks. A big part of the individuals (43%) just remember their household tasks. Thus, the 25% of the individuals sharing flat and not outsourcing the cleaning tasks, use different kind of tools:
* The most used tool is the traditional white board on the fridge (12%)
* Spreadsheets are also used by a 9% of the individuals
* Trello and Todoist are low-used tools by the individuals

In [73]:
schedule = plot_variable(df, "schedule_tool", "count")
schedule.update_layout(
    title="<b>Schedule tools percentages</b>",
    xaxis_title="Schedulling tools",
    yaxis_title="Percentage", width=800, height=400,
    font=dict(family="Courier New, monospace", size=12))
schedule.update_xaxes(tickangle = 30)

### Tasks rotation

More than half of the total individuals rotate their tasks once a week (51%). A 13% of the respondants answered that they rotate the tasks twice a week. Individuals changing the tasks once or twice each month represent a 5% of the total answers to the question. Finally, a remarkable percentage of responses indicate that the tasks are never rotated (31%). As a conclusion, it can be confirmed that individuals tend to change their tasks more often.

In [74]:
rotation = plot_variable(df, "task_rotation", "count")
rotation.update_layout(
    title="<b>Task rotation percentages</b>",
    xaxis_title="Rotation",
    yaxis_title="Percentage", width=800, height=400,
    font=dict(family="Courier New, monospace", size=12))
rotation.update_xaxes(tickangle = 30)

### The tasks

The graph below show the tasks that individuals sharing flat perform in their spaces. As we can see, almost all the tasks are equally performed by the individuals, being the most performed task the kitchen cleaning (18%), followed by Cleaning the bathroom or toilets (17%), buying common items (16%) and cleaning the living room (15%), which represent more than the 65% of the answers. 
The less performed tasks by the individuals are throwing the garbage (13%), doing the laundry (13%) and cleaning the windows (9%). 

In [75]:
tasks_df = df.loc[:, df.columns.str.startswith('task_')]
tasks_total = pd.DataFrame(tasks_df.sum()).reset_index()
tasks_total[0] = tasks_total[0] * 100 / tasks_total[0].sum() #Applying function to the grouped df
#Selecting percentage columns for plotting
tasks_total['index'] = tasks_total['index'].str.split('_').str[-1].str.strip()
tasks_total = tasks_total.sort_values(0, ascending=False)

#Plotting the results
tasks = px.bar(tasks_total, x="index", y=0, width=800, height=400)
tasks.update_layout(
    title="<b>Tasks percentages</b>",
    xaxis_title="<b>Tasks</b>",
    yaxis_title="<b>Percentage</b>",
    font=dict(family="Courier New, monospace", size=12))



### Shared expenses

The graph below shows how individuals are dealing with the common or shared expenses of their homes. Almost half of the individuals tend split the cost of these expenses. A big proportion of them don't really think about the expenses (31%) and a 16% of the individuals just buy the shared items in turns. Finally, a small part of the individuals (2,5%) buy their own stuff.


In [76]:
rotation = plot_variable(df, "shared_expenses", "count")
rotation.update_layout(
    title="<b>Dealing with shared expenses</b>",
    xaxis_title="Rotation",
    yaxis_title="Percentage", width=800, height=400,
    font=dict(family="Courier New, monospace", size=12))
rotation.update_xaxes(tickangle = 30)

### Cleaning alone or alltogether?

The figure below clearly shows that individuals prefer cleaning alone rather than alltogether with their flatmates.

In [77]:
alone_all = pd.DataFrame(df.groupby("alone_all")["alone_all"].agg("count"))
alone_all = alone_all * 100 / alone_all.sum(axis=0)
alone_all = alone_all.unstack().reset_index()
fig = px.pie(alone_all, values=0, names="alone_all")
fig.update_layout(
    title="<b>Cleaning alone or alltogether</b>", width=500, height=350,
    font=dict(
        family="Courier New, monospace",
        size=12
    ))

### Willingness to use an app

The last funnel question will help to select the app features based on its importance while ensuring unbiased answers. In this case, more than half of the individuals indicated that they would not use an app to organize the household tasks. 

In [78]:
use_app = pd.DataFrame(df.groupby("use_app")["use_app"].agg("count"))
use_app = use_app * 100 / use_app.sum(axis=0)
use_app = use_app.unstack().reset_index()
fig = px.pie(use_app, values=0, names="use_app")
fig.update_layout(
    title="<b>Willingness to use the app</b>", width=500, height=350,
    font=dict(
        family="Courier New, monospace",
        size=12
    ))

## STAGE 3: Sharing flat, not outsourcing the cleaning tasks and willing to use the app

### The features

The graph below shows the distribution of the willingness to use each of the features:
  
  
* <b>Setting up tasks/shifts/house rules:</b> Almost half of the individuals (59%) are likely or very likely to use the feature while arround 9% are unlikely or very unlikely to use the feature for setting up tasks, shifts or house rules. Neutral willingness represents the 30% of the individuals. 
  
  
* <b>Tracking common expenses:</b> a 76% of the individuals are likely or very likely to use this feature and a 19% feels neutral about using this feature. The pending 5% corresponds to the individuals that indicated unlikely or very unlikely. 
  
  
* <b>Receiving reports of each roommate's contribution:</b> A bit more of the half of the individuals (52%) are willing to use this feature. A value of a 33% is given to the neutral willigness to use this feature, making it the most neutral. It is important to notice that is the feature with more Very unlikely answers.
  
  
* <b>Receiving notifications / reviewing task calendar:</b> The 69% of the potential customers are either likely or very likely to use this feature. A third part of the answers correspond to a neutral willigness to use a feature to receive notifications or reviewing the task calendar. The last 14% represents the individuals that are unlikely or very unlikely to use this feature.
  
  
* <b>Managing shopping list for household products:</b> This feature appears to be the one with more willigness answers (71%). The neutral willigness to use a feature to manage the shopping list for the household products represents a 33%, while 12% of the individuals indicated a low engagement with this feature. 
  
  
  
    
By importance, the most useful tools for our potential customers are:
1. Tracking common expenses
2. Managing shopping list for household products
3. Receiving notifications / reviewing task calendar
4. Setting up tasks/shifts/house rules
5. Receiving reports of each roommate's contribution

In [79]:
#Creating features df
features_df = df.loc[:, df.columns.str.startswith('feature')].copy()
feature_counts = pd.DataFrame([features_df["feature_task"].value_counts(), features_df["feature_expenses"].value_counts(),
             features_df["feature_reports"].value_counts(), features_df["feature_calendar"].value_counts(),
             features_df["feature_shoplist"].value_counts()])
feature_counts["total"] = feature_counts.sum(axis=1)
for col in feature_counts.columns:
    feature_counts[col] = feature_counts[col] * 100 / feature_counts["total"]
    
#Plotting
labels = list(feature_counts.index)
fig = go.Figure(data=[
        go.Bar(name='Very likely', x=labels, y=feature_counts.iloc[:,0]),
        go.Bar(name='Likely', x=labels, y=feature_counts.iloc[:,2]),
        go.Bar(name='Neutral', x=labels, y=feature_counts.iloc[:,1]),
        go.Bar(name='Unlikely', x=labels, y=feature_counts.iloc[:,4]),
        go.Bar(name='Very unlikely', x=labels, y=feature_counts.iloc[:,3])])
    # Change the bar mode

fig.update_layout(barmode='group', width=800, height=400,
    title="<b>Percentage of number of flatmates</b>",
    xaxis_title="Number of flatmates",
    yaxis_title="Percentage",
    font=dict(
        family="Courier New, monospace",
        size=12
    ))

## Our potential user

In [92]:
df_user = df.dropna() # Dropping the NaN bc the potential user is considered the 
                      # one that made it until the end of the survey
df_user = df_user.drop(columns = ["share_flat", "use_app", "clean_outsource", "tasks"])

- Mostly aged from 25 to 30 years old (40%).
- Most of them live with another person (46%) or with two more people (30%).
- Half of them don't have a pet (51%). Cats live in the 34% of our potential users while dogs in a 15%.
- 59% live with friends or family. People met on apps and acquaintances represent a 17% each.
- Our potential customers tend to split the tasks once a week (61%). Twice a week and never are the second preferred options with a 17% each. 
- Our target looks divided when it comes to deal with the shared expenses: a 40% split the cost and another 40% do not really think about it. Buying it in turns represents almost a 20%. 
- As well as previously, they prefer cleaning alone (63%) rather than cleaning all together. 
  
  
Regarding the tasks:
- 87% cleaning kitchen
- 82% cleaning bathroom or toilets
- 78% cleaning living room
- 70% buy common items
- 56% Throwing the garbage
- 39% cleaning windows