# Rover.com Analytics Assessment
**Andrew Nicholls** | Email: andrew.s.nicholls@gmail.com | [Github](https://github.com/Booleans)

If you are viewing this notebook in Github you may want to switch to the nbviewer page to ensure interactive charts are functioning properly: https://nbviewer.jupyter.org/github/Booleans/rover-analytics-assessment/blob/master/analysis.ipynb

### Exercise Problems
Click the hyperlink to skip to the desired assessment section.

I. [Exploring the Database](#1)  
II. [Conversations and Bookings](#2)  
III. [Recent Daily Booking Rate](#3)   
IV. [Analyzing Take Rate](#4)   
V. [New Conversation Flow](#5)  
VI. [Search Engine Marketing](#6)

In [81]:
import pandas as pd
import numpy as np
import datetime 
from collections import Counter
import plotly.plotly as py
import cufflinks as cf

## Loading and Reviewing Available Data
It's important to look at and understand our available data before attempting an analysis.

#### Conversations_conversation

An owner can book a service provider by starting a conversation with them. This table stores a
record for each conversation started on our platform. Many of the fields on this table are self
explanatory but we have detailed a few below.

* start_date - This is the date for which pet care will first be needed.
* end_date - This is the last date for which pet care will be needed.
* units - This is the number of units of service that the owner is interested in booking.
* added - A timestamp for when this conversation was created.
* booking_total - This is the dollar amount (not including the owner’s service fee) that this
booking would cost.
* requester_id - This foreign key reports the people_person record for the pet owner that is
requesting pet care.
* service_id - This foreign key reports the services_service record for the service that the
pet owner is requesting.
* booked_at - If the request is booked, this timestamp reports when that occurred.
* cancelled_at - A booked request can be cancelled. In that case, this timestamp reports
when that occurred.

In [82]:
df_conversations = pd.read_csv('Data/conversations_conversation.csv', low_memory=False,
                               parse_dates=['added', 'start_date', 'end_date', 'booked_at','cancelled_at'])
print("Imported conversations_conversation.csv into df_conversations with shape: {}".format(df_conversations.shape))
df_conversations.head()

Imported conversations_conversation.csv into df_conversations with shape: (80180, 11)


Unnamed: 0,id,start_date,end_date,units,added,booking_total,cancellation_fault,requester_id,service_id,booked_at,cancelled_at
0,1,2018-07-26,2018-07-31,5,2018-07-16 10:17:53.460035,120,,64393,4646,NaT,NaT
1,2,2018-08-10,2018-08-16,6,2018-08-01 10:20:48.626868,132,,64392,10126,NaT,NaT
2,3,2018-06-16,2018-06-19,3,2018-06-05 16:46:39.542467,168,,64391,20677,NaT,NaT
3,4,2018-07-13,2018-07-20,7,2018-07-02 09:12:22.275923,490,,64391,3847,NaT,NaT
4,5,2018-07-02,2018-07-07,5,2018-06-21 16:02:48.694725,140,,64389,9982,NaT,NaT


#### Conversations_conversation_pets

Since a booking may involve many pets and many pets might have had many bookings, it is
necessary to store this many-to-many relationship on a separate table. Many of the fields on this
table are self explanatory but we have detailed a few below.

* conversation_id - A foreign key to a booking request on the conversations_converation
table. If this conversation involves caring for more than one pets, then this
conversation_id will occur on more than one row on this table (once for each pet).
* pet_id - A foreign key to a pet that will receive pet care during the corresponding
conversation’s booking.

In [83]:
df_conversations_pets = pd.read_csv('Data/conversations_conversation_pets.csv', low_memory=False)
print("Imported conversations_conversation_pets.csv into df_conversations_pets with shape: {}".format(df_conversations_pets.shape))
df_conversations_pets.head()

Imported conversations_conversation_pets.csv into df_conversations_pets with shape: (120188, 3)


Unnamed: 0,id,conversation_id,pet_id
0,1,1,77730
1,2,2,77729
2,3,3,77728
3,4,3,77727
4,5,4,77728


#### Conversations_message

Each conversation consists of a series of messages. A conversation may contain many
messages, but not vice versa. Many of the fields on this table are self explanatory but we have
detailed a few below.
* conversation_id - This foreign key reports the conversation in
conversations_conversation for which this message is apart of.
* sender_id - This foreign key reports the user in people_person that sent this message.

In [84]:
df_conversations_message = pd.read_csv('Data/conversations_message.csv', low_memory=False, parse_dates=['sent'])
print("Imported conversations_message.csv into df_conversations_message with shape: {}".format(df_conversations_message.shape))
df_conversations_message.head()

Imported conversations_message.csv into df_conversations_message with shape: (401211, 5)


Unnamed: 0,id,sent,content,conversation_id,sender_id
0,1,2018-07-16 10:17:53.460035,Massa class.,1,64393
1,2,2018-07-28 16:53:53.927200,Porta lorem ornare condimentum.,1,2709
2,3,2018-07-16 23:55:22.904038,Neque curae rutrum elit conubia metus in.,1,2709
3,4,2018-07-23 10:05:49.829926,Donec etiam gravida luctus tellus phasellus ri...,1,2709
4,5,2018-08-01 10:20:48.626868,Risus class dui leo sem dui sed sollicitudin.,2,64392


#### Conversations_review

If a booking occurs, then either participant can leave a review for the experience. This table
records those reviews, which consist of a brief statement and a star rating. Many of the fields on
this table are self explanatory but we have detailed a few below.
* conversation_id - This foreign key reports the booking in conversations_conversation for
which this review pertains.
* reviewer_id - This foreign key reports the user in people_person that wrote this review.

In [85]:
df_conversations_review = pd.read_csv('Data/conversations_review.csv', low_memory=False)
print("Imported conversations_review.csv into df_conversations_review with shape: {}".format(df_conversations_review.shape))
df_conversations_review.head()

Imported conversations_review.csv into df_conversations_review with shape: (28561, 5)


Unnamed: 0,id,content,stars,conversation_id,reviewer_id
0,1,Netus proin per duis dolor venenatis nam.,1,7,64386
1,2,Dolor proin donec phasellus ve suspendisse ac ...,5,9,64384
2,3,Proin ipsum urna nisl egestas justo class a ar...,5,11,64382
3,4,Porta velit lectus varius donec tellus sollici...,1,13,64381
4,5,Dolor felis.,2,15,64379


#### People_person

This table details each user on our site. This table may contain dog owners, dog sitters, or
people who have not transacted on our site. Many of the fields on this table are self explanatory
but we have detailed a few below.
* channel - This field reports how this user discovered our site when they signed up.
* date_joined - The timestamp for when this user signed up.
* fee - When a user books a service as a dog owner, we charge the owner a separate
service fee that takes the form of a percentage of the booking total.

In [86]:
df_people_person = pd.read_csv('Data/people_person.csv', low_memory=False, parse_dates=['date_joined'])
print("Imported people_person.csv into df_people_person with shape: {}".format(df_people_person.shape))
df_people_person.head()

Imported people_person.csv into df_people_person with shape: (64393, 9)


Unnamed: 0,id,first_name,last_name,email,channel,date_joined,photo,fee,gender
0,1,Leanora,Allcock,leanora.allcock635@hotmail.com,,2016-08-02 14:59:15.095591,https://placekitten.com/242/269,0.0,f
1,2,Elroy,Blanding,elroy.blanding510@yahoo.com,,2016-08-02 18:15:30.105940,https://placekitten.com/373/320,0.0,m
2,3,Jeanice,Aleman,jeanice.aleman392@hotmail.com,,2016-08-02 16:11:09.542004,https://placekitten.com/238/264,0.0,f
3,4,Tamala,Polhamus,tamala.polhamus146@aol.com,,2016-08-02 18:02:40.389299,https://placekitten.com/220/223,0.0,f
4,5,Alethea,Gubler,alethea.gubler708@aol.com,,2016-08-02 14:31:53.163034,https://placekitten.com/284/339,0.0,f


#### People_testsegmentation

Occasionally, this company would run an A/B test which required that users get placed in two
groups. This table provides a log for experiments which require user-level segmentations. Many
of the fields on this table are self explanatory but we have detailed a few below.
* person_id - This foreign key reports the people_person record that was segmented.
* test_name - Multiple tests were run on this site and all are logged on this table. Use this
column to filter to the correct experiment.
* test_group - For the purposes of the experiment in test_name , the user given by
person_id was segmented into the group named in this column (e.g., holdout , variant , A
, B , etc.).
* added - The timestamp reporting the time when this user was segmented.

In [87]:
df_people_test = pd.read_csv('Data/people_testsegmentation.csv', low_memory=False, parse_dates=['added'])
print("Imported people_testsegmentation.csv into df_people_test with shape: {}".format(df_people_test.shape))
df_people_test.head()

Imported people_testsegmentation.csv into df_people_test with shape: (87778, 5)


Unnamed: 0,id,test_name,test_group,added,person_id
0,1,Email Test,holdout,2016-08-02 14:59:15.095591,1
1,2,Email Test,variant,2016-08-02 18:15:30.105940,2
2,3,Email Test,holdout,2016-08-02 16:11:09.542004,3
3,4,Email Test,holdout,2016-08-02 18:02:40.389299,4
4,5,Email Test,holdout,2016-08-02 14:31:53.163034,5


#### Pets_pet

This table details each pet that a user has added to their profile. One owner may have more
than one pet, but not vice versa. Many of the fields on this table are self explanatory but we
have detailed a few below.
* description - A short (lorem ipsum) description of the pet.
* plays_cats - If 1, then this pet plays well with cats.
* plays_children - If 1, then this pet plays well with children.
* plays_dogs - If 1, then this pet plays well with dogs.
* spayed_neutered - If 1, then this pet has been spayed or neutered.
* house_trained - If 1, then this pet is house trained.
* owner_id - This foreign key reports the people_person record for this pet’s owner.

In [88]:
df_pets = pd.read_csv('Data/pets_pet.csv', low_memory=False, parse_dates=['birthday'])
print("Imported pets_pet.csv into df_pets with shape: {}".format(df_pets.shape))
df_pets.head()

Imported pets_pet.csv into df_pets with shape: (77730, 13)


Unnamed: 0,id,name,description,gender,weight,birthday,plays_cats,plays_children,plays_dogs,spayed_neutered,house_trained,size,owner_id
0,1,Jammie,Morbi fames a mauris elit malesuada platea.,f,76,2016-05-26,1,1,1,1,1,large,12601
1,2,Lonnie,Class magna a libero felis sociosqu.,f,12,2014-05-20,0,1,1,1,0,small,12602
2,3,Emely,Felis class.,m,11,2014-08-21,0,1,1,1,0,small,12602
3,4,Emelia,Fames class egestas mollis risus posuere.,f,35,2013-09-23,1,1,1,0,0,medium,12603
4,5,Jami,Netus augue a congue orci.,m,35,2014-05-13,0,1,1,1,1,medium,12603


#### Services_service

On our site, users may offer pet care services. This table stores a record for each service that is
offered. Each user can offer more than one service, but not more than one of each type. Many
of the fields on this table are self explanatory but we have detailed a few below.
* max_dogs - This number is the maximum number of pets this provider would prefer to
care for.
* fee - When a user books with a service, we take a percentage of the booking total. This
field reports the percentage.
* provider_id - This foreign key reports the people_person record for this service’s
provider.
* added - A timestamp for when this service became active.
* price - The price per unit booked.

In [89]:
df_services = pd.read_csv('Data/services_service.csv', low_memory=False, parse_dates=['added'])
print("Imported services_service.csv into df_services with shape: {}".format(df_services.shape))
df_services.head()

Imported services_service.csv into df_services with shape: (21398, 16)


Unnamed: 0,id,service_type,cancellation_policy,can_provide_oral_medication,can_provide_injected_medication,senior_dog_experience,special_needs_experience,takes_small_dogs,takes_medium_dogs,takes_large_dogs,takes_puppies,max_dogs,provider_id,fee,price,added
0,1,boarding,strict,1,1,1,1,0,1,1,1,4,1,0.15,35,2016-08-02 14:59:15.095591
1,2,dog-walking,strict,1,0,1,1,0,0,1,1,5,1,0.15,26,2016-08-02 14:59:15.095591
2,3,boarding,moderate,0,0,1,0,0,0,1,1,2,2,0.15,31,2016-08-02 18:15:30.105940
3,4,dog-walking,strict,1,0,1,0,1,0,0,1,5,2,0.15,27,2016-08-02 18:15:30.105940
4,5,day-care,strict,1,0,1,1,0,1,1,1,5,2,0.15,30,2016-08-02 18:15:30.105940


<a id='1'></a>
## I. Exploring the Database
1. How many users have signed up?  
*The answer is 64393.*
2. How many users signed up prior to 2018-02-03?  
*The answer is 35826.*
3. What percentage of users have added pets?  
*The answer is 80.43%.*
4. Of those users, how many pets have they added on average?  
*The answer is 1.501.*
5. What percentage of pets play well with cats?  
*The answer is 24.85%.*

In [90]:
# 1. How many users have signed up?
df_people_person['id'].nunique()

64393

In [91]:
# 2. How many users signed up prior to 2018-02-03?
sum(df_people_person['date_joined'] < pd.Timestamp(2018,2,3))

35826

In [92]:
# 3. What percentage of users have added pets?
round(df_pets['owner_id'].nunique()/df_people_person['id'].nunique() * 100, 2)

80.43

In [93]:
# 4. Of those users, how many pets have they added on average?
round(len(df_pets)/df_pets['owner_id'].nunique(), 3)

1.501

In [94]:
# 5. What percentage of pets play well with cats?
round(df_pets['plays_cats'].sum()/len(df_pets) * 100, 2)

24.85

<a id='2'></a>
## II. Conversations and Bookings

Some users can offer pet care services. When an owner needs pet care, they can create a conversation with another user that offers the service they are interested in. After exchanging some messages and possibly meeting in person, that conversation hopefully books. In that case, services are paid for and delivered. Occasionally, some conversations that have booked may be cancelled. Lastly, for un-cancelled bookings, both owners and sitters have the option of leaving a review.  In the following questions, we explore these concepts.

1. For un-cancelled bookings, is the owner or provider more likely to leave a review and
which tends to leave better reviews? How would you narrate this finding to a business partner?

First, let's get all the uncancelled bookings.

In [95]:
uncancelled_bookings = df_conversations[df_conversations['booked_at'].notnull() & df_conversations['cancelled_at'].isnull()]
uncancelled_bookings.head()

Unnamed: 0,id,start_date,end_date,units,added,booking_total,cancellation_fault,requester_id,service_id,booked_at,cancelled_at
5,6,2018-07-04,2018-07-07,3,2018-06-23 16:16:16.891344,78,,64388,14772,2018-06-27 14:39:51.433544,NaT
6,7,2018-07-31,2018-08-04,4,2018-07-21 04:55:57.951572,100,,64386,14783,2018-07-22 02:50:20.676664,NaT
8,9,2018-07-02,2018-07-03,1,2018-06-21 06:23:08.092389,23,,64384,7656,2018-06-23 01:49:33.596667,NaT
10,11,2018-06-12,2018-06-18,6,2018-06-03 09:11:33.421951,300,,64382,2512,2018-06-07 01:24:20.187356,NaT
12,13,2018-07-24,2018-07-27,3,2018-07-13 10:30:05.875588,126,,64381,1634,2018-07-19 01:01:11.552880,NaT


In [96]:
df_conversations_review.head()

Unnamed: 0,id,content,stars,conversation_id,reviewer_id
0,1,Netus proin per duis dolor venenatis nam.,1,7,64386
1,2,Dolor proin donec phasellus ve suspendisse ac ...,5,9,64384
2,3,Proin ipsum urna nisl egestas justo class a ar...,5,11,64382
3,4,Porta velit lectus varius donec tellus sollici...,1,13,64381
4,5,Dolor felis.,2,15,64379


We need to join the reviews to the uncancelled bookings. Once we do that we only need to keep requester_id, reviewer_id, and stars in order to determine the differences between owners and providers.

In [97]:
df_reviews = pd.merge(uncancelled_bookings, df_conversations_review,
                      left_on='id', right_on='conversation_id')[['requester_id', 'reviewer_id', 'stars']]
df_reviews.head()

Unnamed: 0,requester_id,reviewer_id,stars
0,64386,64386,1
1,64384,64384,5
2,64382,64382,5
3,64381,64381,1
4,64379,64379,2


If the person leaving the review is also the person who requested the service then we know it is an owner's review. 

In [98]:
df_reviews['reviewer_type'] = np.where(df_reviews['reviewer_id'] == df_reviews['requester_id'], 'owner', 'provider')
df_reviews.head(5)

Unnamed: 0,requester_id,reviewer_id,stars,reviewer_type
0,64386,64386,1,owner
1,64384,64384,5,owner
2,64382,64382,5,owner
3,64381,64381,1,owner
4,64379,64379,2,owner


In [99]:
df_reviews.groupby('reviewer_type', as_index = False).agg(['count', 'mean'])['stars']

Unnamed: 0_level_0,count,mean
reviewer_type,Unnamed: 1_level_1,Unnamed: 2_level_1
owner,22499,4.415841
provider,6062,3.904817


**Answer:**  

As we can see in the table above, pet owners are nearly 4 times as likely to leave a review compared to the providers. Pet owners also give higher average reviews of around 4.42 stars compared to 3.89 stars from providers. This is approximately half a star higher.

To narrate this finding to a business partner I would use the simple chart below to show the discrepancy between how often providers and owners leave reviews. To show the average rating, I would simply use the table in the cell above that shows the mean rating by reviewer type. This information can easily be communicated with text instead of a chart.

In [100]:
df_reviews.groupby('reviewer_type', as_index = False).agg(['count', 'mean'])['stars']['count'].iplot(
                    kind='bar', filename='rover-review-counts', yTitle='Number of Reviews', title='Numbers of Reviews by Reviewer Type',
                    color='green')

<a id='3'></a>
## III. Recent Daily Booking Rate

The snapshot of this database was taken on 2018-08-02 at midnight and only contains data reflecting events prior to that date. A junior analyst is investigating daily booking rate during the days prior to the snapshot and is concerned about an apparent downward trend. You are tasked with helping them out.
1. First, let's reproduce their results. They tell you that daily booking rate is defined to be the percentage of conversations created each day that eventually book. What is the daily booking rate for each of the 90 days prior to the snapshot? Is there a downward trend?

2. Can you narrate a reason why this trend exists? Is there a reason to be concerned?

In [101]:
df_temp = df_conversations[['added', 'booked_at']]
df_temp.head()

Unnamed: 0,added,booked_at
0,2018-07-16 10:17:53.460035,NaT
1,2018-08-01 10:20:48.626868,NaT
2,2018-06-05 16:46:39.542467,NaT
3,2018-07-02 09:12:22.275923,NaT
4,2018-06-21 16:02:48.694725,NaT


In [102]:
df_temp['was_booked'] = df_temp['booked_at'].notnull().astype(int)
df_temp.head(5)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy



Unnamed: 0,added,booked_at,was_booked
0,2018-07-16 10:17:53.460035,NaT,0
1,2018-08-01 10:20:48.626868,NaT,0
2,2018-06-05 16:46:39.542467,NaT,0
3,2018-07-02 09:12:22.275923,NaT,0
4,2018-06-21 16:02:48.694725,NaT,0


In [103]:
df_temp.dtypes

added         datetime64[ns]
booked_at     datetime64[ns]
was_booked             int64
dtype: object

In [104]:
# We only need data from the previous 90 days.
# I'm using a static cutoff date for this problem but it could be made dynamic to update automatically when new data is available.
cutoff_date = datetime.date(2018,5,3)
df_temp = df_temp[df_temp['start_date'] > cutoff_date]
df_temp.head()

KeyError: 'start_date'