# Product Insigh Analyst Case Study #

We have two similar systems in Xero, full BAS and simpler BAS, for users to fulfill their compliance needs with the ATO. Attached are some extracts of times when each of the reports were being run by our users (simplebas.csv and fullbas.csv), and some metadata about the organisations (orgcard.csv).



Limit your analysis to active, paying organisations (orgs).

In this scenario, you are looking to present insights to the product manager of BAS at Xero. They would like to see:
1. an overview of how and when users interact with each system;
2. a comparison in terms of usage of each system;
3. a view of seasonality that could inform the need to increase compute capacity;
4. your recommendation on what the best time of month would be to have a two hour
outage window for each system.

Here are some key data points to include in your data story:
- What month saw the most report runs?
- How many orgs ran both reports?
- How many users have used either report?
- How many users have run reports for multiple organisations?
- Which pricing plans are the most popular amongst organisations using BAS?

### Note: 
Import variable: time stamps, userid, orgid, productopion, payingflag, runtime.

### Main Idea:

We can build a dashboard that performs time series analyses on FullBAS, and SimpleBAS data. \
First, we can create a plot of general system usage (user/org agnostic) between Full and Simple, 2 overlaying histograms. \
Then, decompose the above to filter for organizations. (we might even be able to do some kind of orgnisation classification here base on usage frequency) \
Hone into FullBAS, weigh the histograms by runtime to identify bottle necks/choking period to determine the need for compute capacity, and spot offtime for best outage window. 



We can implement some basic filters:
1. time
2. paying flags. 
3. productoption
4. orgstatus

In [2]:
import pandas
from jupyter_dash import JupyterDash
from dash import dcc
from dash import html
from dash.dependencies import Input, Output

In [2]:
orgdetails = pandas.read_csv('data/orgcard.csv')
orgdetails.head(5)

Unnamed: 0,organisationid,shardid,shortcode,organisationstatus,createddateutc,signupdateutc,productoption,deleteddateutc,saleschannel,channeltype,...,salestaxperiod,salestaxtype,incometaxbasis,provisionaltaxbasis,managedbypracticeflag,payingflag,marketcode,nonpracticestaffusers,practicestaffusers,trialflag
0,CDEAB7FB-E43F-4344-904D-3137FED5AAC7,87,!Qddkq,Active,2015-10-14 05:32:10.737,2020-02-13 00:00:00,Standard,,Partner,Accountant,...,Australian Quarterly (Option 1),Australian GST,,,1,1,AU,2,6,0
1,726A4578-00D6-4357-BFE7-85D1A656EEBF,121,!DhWsK,Active,2018-08-06 00:52:15.52,2018-08-06 00:00:00,Standard,,Partner,Accountant,...,Australian Quarterly (Option 1),Australian GST,,,1,1,AU,0,15,0
2,5CDF4968-430D-4C55-A52E-845CD144B786,116,!QPRI6,Active,2014-11-02 23:26:12.553,2014-11-03 00:00:00,Standard,,Partner,Accountant,...,Australian Quarterly (Option 1),Australian GST,,,1,1,AU,1,18,0
3,8F483757-ACE6-4A36-8805-10481BCCAC42,116,!4VKHh,Active,2014-10-22 01:55:59.02,2014-10-22 00:00:00,Standard,,Partner,Bookkeeper,...,Australian Quarterly (Option 1),Australian GST,,,1,1,AU,1,3,0
4,195AA44C-DD6D-42A9-914A-552086303B2B,133,!hWiKs,Active,2015-12-07 00:09:08,2017-03-15 00:00:00,Premium 5,,Partner,Bookkeeper,...,Australian Monthly,Australian GST,Australian Quarterly Installment (Option 1),,1,1,AU,1,11,0


In [3]:
fullbas = pandas.read_csv('data/fullbas.csv')
simplebas = pandas.read_csv('data/simplebas.csv')

fullbas.head(5)


Unnamed: 0,datestring,timestring,level,userid,orgid,reportname,versioncode,countrycode,runtime
0,2016-10-11,23:55:52.2119Z,INFO,e07083a1-8528-4bf7-8c43-f4e33066e889,6220e36d-457e-4221-953c-8d1c8f07a3b1,ActivityStatement,VERSION/AU,CNTRY/AU,4.396793
1,2016-10-11,23:51:09.5648Z,INFO,55681c55-2199-4a05-aa14-abc965c17829,256263f5-f1e2-457c-96db-382492e5e7cf,ActivityStatement,VERSION/AU,CNTRY/AU,0.34429
2,2016-10-11,23:51:09.1586Z,INFO,55681c55-2199-4a05-aa14-abc965c17829,256263f5-f1e2-457c-96db-382492e5e7cf,ActivityStatement,VERSION/AU,CNTRY/AU,0.631424
3,2016-10-11,23:47:20.0615Z,INFO,0389c310-812a-45ba-b61b-a6f0bf5fb4ff,bed924b2-41d9-4151-acba-62725556b5dc,ActivityStatement,VERSION/AU,CNTRY/AU,0.159535
4,2016-10-11,23:47:16.3896Z,INFO,0389c310-812a-45ba-b61b-a6f0bf5fb4ff,bed924b2-41d9-4151-acba-62725556b5dc,ActivityStatement,VERSION/AU,CNTRY/AU,0.239303


In [4]:
simplebas.head(5)

Unnamed: 0,userid,shortcode,statementyear,statementmonth,datetime
0,39e0876d-42df-495f-9309-3d18049f6eb4,!V5xDN,2016,12,2017-09-15 01:54:13
1,3fa90fc4-0fe2-4c87-97e4-62d3e3caa0be,!WTu7O,2017,9,2017-09-15 01:19:27
2,4f124d5c-0eac-4fb1-8445-cac716346a22,!8xmRn,2017,8,2017-09-15 00:55:54
3,4f124d5c-0eac-4fb1-8445-cac716346a22,!8xmRn,2017,7,2017-09-15 00:55:53
4,4f124d5c-0eac-4fb1-8445-cac716346a22,!8xmRn,2017,9,2017-09-15 00:55:44


In [3]:
app = JupyterDash(__name__)
app.layout = html.Div([
    html.H1("Random datastream"),
            dcc.Interval(
            id='interval-component',
            interval=1*1000, # in milliseconds
            n_intervals=0
        ),
    dcc.Graph(id='graph'),
])
app.run_server(mode='inline', port=8050)