## Graphing user metrics from Intercom data

This is a report on the state of ACME's business as per its Intercom data. ACME is an online platform that provides business and administration software for small businesses and self-employed inviduals across a whole host of different industries.

Please **read the 1. [Readme](https://kyso.io/KyleOS/intercom-template/file/README.md) and 2. [the data preparation report](https://kyso.io/KyleOS/intercom-template/file/data-prep.ipynb)** on how to get your environment set up and access your Intercom data before running this notebook.

In [1]:
# Imports

import pandas as pd
import plotly.express as px
import math
import json
import cufflinks as cf
import plotly.graph_objs as go    
import plotly.offline
import numpy as np
import re

cf.go_offline()
cf.set_config_file(offline=False, world_readable=True)

In the [data preparation report](https://kyso.io/KyleOS/intercom-template/file/data-prep.ipynb) we imported all our data from Intercom and cached it to JSON files. Now we can start by pulling in our user data and converting the timestamps to readable date/times.

In [2]:
df = pd.read_json('./data/users.json')
df['created_at'] = pd.to_datetime(df['created_at'],unit='s')
df['updated_at'] = pd.to_datetime(df['updated_at'],unit='s')
df['last_seen_at'] = pd.to_datetime(df['last_seen_at'],unit='s')
df['last_replied_at'] = pd.to_datetime(df['last_replied_at'],unit='s')
df['last_contacted_at'] = pd.to_datetime(df['last_contacted_at'],unit='s')

## Montly Signups
Let's first plot out number of signups by month.

In [3]:
fig = px.histogram(df, x="created_at", nbins=24, width=800, height=600,)
fig.layout.xaxis.title.text = 'Date'
fig.layout.yaxis.title.text = '# of Users'
fig.show()

- We can see our user growth take off in August 2019.
- 2020 started off great with a big spike in January.
- The coronavirus pandemic had a big impact on the number of signups - growth drops between February and May before picking back up again in June as the lockdown restrictions were eased.

## User Churn
We're going to use the *'last seen date'* variable of users as a proxy for user churn.

In [4]:
fig = px.histogram(df[(df['last_seen_at'] < '2020-08-01')], x="last_seen_at", nbins=24, width=800, height=600)
fig.layout.xaxis.title.text = 'Last Seen Date'
fig.layout.yaxis.title.text = '# of Users'
fig.show()

- The graph above shows us the date of last activity for users by month.
- For example, we can see big spikes between February and March in the number of users that have not returned to the platform since - meaning the coronavirus pandemic had a big impact on a lot of businesses.
- While noting the big increase in number of users not seen since July in the last few months, it's important to remember that we have experienced a massive increase in signups so, naturally, the absolute number of churned users will also grow.

## Age of active users
We define *active users* as those who have been online within the last 2 months. We will calculate this metric in weeks.

In [5]:
actives = df[(df['last_seen_at'] > '2020-08-01')]
actives['age_delta'] = df['last_seen_at'] - df['created_at']
actives['age'] = actives['age_delta'].dt.days
actives['age'] = actives['age']/7

In [6]:
fig = px.histogram(x=actives[actives['age'] > 0]['age'].tolist(), labels={'x': 'Age in Weeks'}, width=800, height=600)

fig.update_traces(xbins=dict(
    start=0.0,
    end=365,
    size=1
))

fig.layout.yaxis.title.text = '# of Users'


fig.show()

- We can see that the large majority of our active users are those who have signed up within the last 20 weeks, or 5 months.
- Effectively zero users that signed up more than 5 months ago continue to be active.

## User browsers

It's important for us to have a general idea of which browsers the majority of our audience prefers. Why? It tells us which browsers we need to use when testing. Also, the dev team may need to alter some code like certain CSS aspects if a lot of users are using older browsers.

In [7]:
fig = px.histogram(df, x="browser", width=800, height=600)

fig.layout.xaxis.title.text = 'Browser'
fig.layout.yaxis.title.text = '# of Users'

fig.show()

- Note that this graph pertains to our entire dataset - since October 2018.
- As would be expected, the large majority of users use either chrome or safari.

## User operating systems

Like with the browsers used, the users' operating systems could be of interest to the team.

In [8]:
from string import digits
remove_digits = str.maketrans('', '', digits)

def parse_os(os):
    return os.translate(remove_digits).replace(".", "").strip()
  
df_os = df[['os']].dropna().applymap(parse_os)

def parse_platform(os):
    if os == 'Android' or os == 'iOS':
        return "mobile"
    else:
        return "desktop"
  
df_os['platform'] = df_os[['os']].dropna().applymap(parse_platform)

In [9]:
fig = px.histogram(df_os, x="os", width=800, height=600)

fig.layout.xaxis.title.text = 'Operating System'
fig.layout.yaxis.title.text = '# of Users'

fig.show()

## Users platforms

Again, like with both the browsers and operating systems used, whether users are firing up the app more often on desktop or mobile is extremely important to the engineering team.

In [10]:
fig = px.histogram(df_os, x="platform", width=800, height=600)

fig.layout.xaxis.title.text = 'Platform'
fig.layout.yaxis.title.text = '# of Users'

fig.show()

## Geograhpical breakdown

For the sake of readability, we've filtered the data to only include those countries with over 10 users.

In [32]:
locations = pd.DataFrame(df.location.apply(eval).apply(pd.Series))
locations = locations.groupby('country').filter(lambda x : len(x)>10)

fig = px.histogram(x = locations['country'], width=800, height=600)

fig.layout.xaxis.title.text = 'Country'
fig.layout.yaxis.title.text = '# of Users'

fig.update_layout(xaxis={'categoryorder':'total descending'})

fig.show()

- By far most of our users are logging in from India or the United States, followed by Spain.
- Of the next 34 countries, only Ireland, Brazil, Sweden and the United Kingdom have 100 users or more. 

## Median time to first reply
How fast is our team to respond on chat to the very first message?

In [41]:
convos = pd.read_json('./data/conversations.json')
convo_stats = convos['statistics'].apply(pd.Series)

In [42]:
fig = px.histogram(x=convo_stats['time_to_admin_reply']/3600, nbins=10000, width=800, height=600)

fig.update_xaxes(range=[0, 6], tick0=0, dtick=1, tickangle=45, tickfont=dict(size=8))
fig.layout.xaxis.title.text = 'Time to Reply (hours)'
fig.layout.yaxis.title.text = '# of Conversations'

fig.show()

- Our response time is very good, with a large majority of first-time messages being answered within the first 10 min.
- Another big chunk of first-time messages are answered within 20 min.

## Median time to all replies

In [43]:
fig = px.histogram(x=convo_stats['median_time_to_reply']/3600, nbins=1000, width=800, height=600)


fig.layout.xaxis.title.text = 'Time to Reply (hours)'
fig.layout.yaxis.title.text = '# of Conversations'

fig.update_xaxes(range=[0, 6], tick0=0, dtick=1, tickangle=45, tickfont=dict(size=8))

fig.show()

- We continue to have a good response rate as conversations continue.

## Number of messages per conversation

In [16]:
fig = px.histogram(x=convo_stats['count_conversation_parts'], width=800, height=600)

fig.layout.xaxis.title.text = '# of Messages in Conversation'
fig.layout.yaxis.title.text = '# of Conversations'

fig.update_xaxes(tick0=0, dtick=5, tickangle=45, tickfont=dict(size=8))

fig.show()

- The graph is showing us some form of binomial distribution in number of messages per conversation.
- The first spike is likely due to minor bugs or queries that tend to be solved/answered almost immediately.
- The second spike then probably pertains to more complex issues, peaking at around 14 messages before dropping off.
- Very few conversations require more than 30 messages.