![Instacart](https://raw.githubusercontent.com/interviewquery/takehomes/instacart_1/instacart_1/logo.png)
# Data Analyst Challenge

## Directions

We'd love for you to analyze the data in `data.csv` and share what you find. We know
that you don't know much about how our team currently is run, but that's
okay. This data set includes information on orders, order location,
customer ratings, and any issues reported by the customer for a set of
orders.

1. Please analyze the data in the adjacent tab and share with us the

    1. any observations about our business.

    1. How would you staff the Customer Support Team?

Please compile your analysis into a document or deck to convey your
findings. Use the Data Set as necessary to substantiate your claims.


In [1]:
!git clone --branch instacart_1 https://github.com/interviewquery/takehomes.git
%cd takehomes/instacart_1
!ls

Cloning into 'takehomes'...
remote: Enumerating objects: 1968, done.[K
remote: Counting objects: 100% (1968/1968), done.[K
remote: Compressing objects: 100% (1222/1222), done.[K
remote: Total 1968 (delta 755), reused 1933 (delta 729), pack-reused 0 (from 0)[K
Receiving objects: 100% (1968/1968), 299.41 MiB | 18.21 MiB/s, done.
Resolving deltas: 100% (755/755), done.
/content/takehomes/instacart_1
data.csv  logo.png  metadata.json  takehomefile.ipynb


In [2]:
#write your code here
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
df=pd.read_csv('data.csv')
df.head(5)

Unnamed: 0,order delivery time,order id,customer order rating,type of issue reported,region
0,2014-06-02 04:23:16 UTC,233599337,5,,chi
1,2014-06-02 03:57:50 UTC,233599376,5,,chi
2,2014-06-02 02:52:38 UTC,233599328,5,,chi
3,2014-06-02 02:52:04 UTC,233599070,5,,chi
4,2014-06-02 02:41:43 UTC,233599100,5,,chi


In [4]:
df.describe()

Unnamed: 0,order id,customer order rating
count,14957.0,14957.0
mean,104111800.0,4.5582
std,115978300.0,1.002157
min,208056.0,0.0
25%,232982.0,5.0
50%,245829.0,5.0
75%,233589000.0,5.0
max,233614700.0,5.0


In [6]:
# !pip install sqlalchemy
from sqlalchemy import create_engine



In [10]:
engine = create_engine("sqlite:///customer_data.sqlite")

df.to_sql("customer_data", engine, if_exists="replace", index=False)
df_query = pd.read_sql("SELECT * FROM customer_data", engine)

In [11]:
df_query

Unnamed: 0,order delivery time,order id,customer order rating,type of issue reported,region
0,2014-06-02 04:23:16 UTC,233599337,5,,chi
1,2014-06-02 03:57:50 UTC,233599376,5,,chi
2,2014-06-02 02:52:38 UTC,233599328,5,,chi
3,2014-06-02 02:52:04 UTC,233599070,5,,chi
4,2014-06-02 02:41:43 UTC,233599100,5,,chi
...,...,...,...,...,...
14952,2014-05-07 20:29:32 +0000,233614661,0,,sf
14953,2014-05-05 23:59:17 +0000,233614666,0,,sf
14954,2014-05-04 22:48:29 +0000,233614671,0,,sf
14955,2014-05-03 17:41:36 +0000,233614676,0,,sf


In [14]:
distinct_regions_df = pd.read_sql("SELECT DISTINCT region FROM customer_data", engine)
display(distinct_regions_df)

Unnamed: 0,region
0,chi
1,nyc
2,sf
3,sf


In [18]:
query = """
SELECT `order id`, count(`order id`) as count
FROM customer_data
GROUP BY `order id`
ORDER BY count DESC
"""
OrderId_df = pd.read_sql(query, engine)
display(OrderId_df)

Unnamed: 0,order id,count
0,233598760,6
1,246371,6
2,237775,6
3,228263,5
4,226403,5
...,...,...
13840,215936,1
13841,215925,1
13842,214101,1
13843,214084,1


In [21]:
query = """
SELECT `customer order rating` as rating, count(`customer order rating`) as count
FROM customer_data
GROUP BY `customer order rating`
ORDER BY rating DESC
"""
rating_df = pd.read_sql(query, engine)
display(rating_df)

Unnamed: 0,rating,count
0,5,11602
1,4,1680
2,3,778
3,2,370
4,1,373
5,0,154


In [23]:
query = """
SELECT MIN(`order delivery time`) AS earliest_delivery_time,
       MAX(`order delivery time`) AS latest_delivery_time
FROM customer_data
"""
time_range_df = pd.read_sql(query, engine)
display(time_range_df)

Unnamed: 0,earliest_delivery_time,latest_delivery_time
0,2014-05-01 08:54:00 +0000,2014-06-02 06:28:37 +0000


In [25]:
query = """
SELECT `type of issue reported`, COUNT(*) as count
FROM customer_data
GROUP BY `type of issue reported`
ORDER BY count DESC
"""
issue_counts_df = pd.read_sql(query, engine)
display(issue_counts_df)

Unnamed: 0,type of issue reported,count
0,,13870
1,Wrong item,374
2,Damaged or spoiled,310
3,Item missing,178
4,Poor service,129
5,Poor replacement,54
6,Other Order Issue,21
7,Item charged incorrectly,21
