# Introduction

Stuart is the leading on-demand solution powering the way goods are transported in a
customised way. We connect businesses across all industries and of all sizes to high quality
independent couriers to offer customised delivery solutions.

As a data analyst at Stuart your role is to abstract the complexity of the business and the
underlying data to provide meaningful insights to decision makers. To assess your suitability
to the role, we’re providing you with a realistic data set and questions similar to what you
may encounter in the day to day work at Stuart


# Data
The data set attached consists of records of the principal business events generated from
the moment a package appears in our system until it is delivered to its final destination.

There are 22 tables in the data set recording events around 5 principle objects in Stuart’s
universe, those are: packages, deliveries, tasks, drivers and invitations.

- Packages are the physical articles that need to be delivered and can very in size,
type and urgency
- Deliveries represent the job a driver needs to perform to deliver one or more
packages
- Tasks are the different segments of a delivery. For example: a simple delivery
consists of two tasks: a pickup task and a drop off task
- Drivers are the persons performing the deliveries and can do so using different
transport types
- Invitations are the outcome of our dispatching algorithms to connect free Drivers to
Deliveries

The data is provided in an SQLite format and can be queried using any of the free sqlite
clients such as DB Browser or TablePlus

Please note that no data dictionary is provided and you’ll need to make sense of the data
based on the table names and the above information.


## Question 1
What is the success rate of packages per Zone? (% of Packages delivered on time)

In [None]:
//SQL QUERY

SELECT 
pd.zone_id,
p.zone_name,
ROUND((SUM(case when pd.created_at <= time_window_do_end then 1 else 0 end )*1.0 / count(p.created_at))*100,1)  as success_rate
from package p
inner join package_delivered pd 
on pd.package_id = p.package_id
group by pd.zone_id
order by success_rate desc

## Question 2
Is the driver invitation acceptance rate higher or lower among drivers who have been
on the platform for longer?

In [None]:
//SQL QUERY 

SELECT
ROUND(AVG(a.rate_accepted),0) as avg_rate_accepted,
a.years_worked 

from 
(select
d.driver_id,
count(ia.delivery_invitation_id) as total_invites_accept,
count(ic.delivery_invitation_id),
(count(ia.delivery_invitation_id)*1.0/count(ic.delivery_invitation_id))*100 as rate_accepted,
--DATE('now'),
--d.created_at,
julianday('now') - julianday(d.created_at),
(CASE WHEN (julianday('now') - julianday(d.created_at)) < 365 then '0-1 Year' 
WHEN (julianday('now') - julianday(d.created_at)) between 365 and 730 then '1-2 Years'
WHEN (julianday('now') - julianday(d.created_at)) between 730 and 1095 then '2-3 Years'
WHEN (julianday('now') - julianday(d.created_at)) between 1095 and 1460 then '3-4 Years'
WHEN (julianday('now') - julianday(d.created_at)) between 1460 and 1825 then '4-5 Years'
WHEN (julianday('now') - julianday(d.created_at)) between 1825 and 2190 then '5-6 Years'
WHEN (julianday('now') - julianday(d.created_at)) between 2190 and 2555 then '6-7 Years'
WHEN (julianday('now') - julianday(d.created_at)) > 2555 then '7+ Years'
END)  as  years_worked
from driver d
inner join invitation_created ic 
on d.driver_id = ic.driver_id
left join invitation_accepted ia 
on ic.delivery_invitation_id = ia.delivery_invitation_id
group by d.driver_id
) a
group by a.years_worked
order by a.years_worked asc

test
