# Analyzing VSS in Universities

This notebook analyzes the use of VSS in academia.

From Mark:

>We think that academic VSS usage is heavily weighted toward RFB/RFI,so it would be >helpful if we could get a report that compares:
> 
>  - VSS usage vs. MWO usage (what % of academic usage is [any form of] VSS?)
>  - RFB/RFI usage vs. VSS time domain usage
> 
>If possible, VSS usage on non-example projects, so either a new,
>user-created project or a user-modified example project

## Setup the Data

In [1]:
import sys
from datetime import datetime
from IPython.display import Markdown as md
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pylab
pylab.rcParams['figure.figsize'] = (12,8)  # set default plot size

In [2]:
data_path = '/data/a/tlm/sessions/'
sdf = pd.read_csv(data_path + 'session.csv', encoding='latin1', low_memory=False)
print('Total sessions = {}'.format(len(sdf)))
sddf = pd.read_csv(data_path + 'sessiondata.csv', low_memory=False)

# cleanup things that we don't want to count
mask1 = sddf.ident.isin(['SYM2BITS', 'PORT', 'PORT_TN', 'Miscellaneous:Separator'])
mask2 = sddf.ident.notnull()
sddf = sddf[~mask1 & mask2]
print('Total sessiondata records = ', len(sddf))

s = dict(academic={}, commercial={})

Total sessions = 992414
Total sessiondata records =  18997239


In [3]:
# get only academics
adf = sdf[(sdf.user_type == 'academic') & (sdf.app == 'M')]
print('Number of academic sessions:', len(adf))

# let's keep only those in this academic semester
adf = adf[adf.created_time >= '2018-08-15']
print('Academic sessions this semester', len(adf))

# remove sessions that have no commands
adf = adf[adf.has_commands]
print('Sessions with commands', len(adf))
s['academic']['Sessions with commands'] = len(adf)

Number of academic sessions: 217159
Academic sessions this semester 30431
Sessions with commands 23591


In [4]:
# convert create time from string to datetime
adf['created_time'] = pd.to_datetime(adf.created_time)
# add week
adf['week'] = adf.created_time.apply(lambda x: (x.year - 2010) * 100 + int(x.strftime('%U')))

# add user-week
adf['user_week'] = adf.user_id * 1000 + adf.week

# cleanup columns
to_drop = ['has_commands', 'user_type', 'custid', 'runtime', 'state', 'auto_proj',
           'guid', 'instid', 'app', 'start_user', 'iu_name', 'created_time']
adf = adf.drop(to_drop, axis=1)

In [5]:
# cleanup sddf columns
sddf.drop(['id', 'opncnt'], axis=1, inplace=True)

In [6]:
# merge sdf and sddf
df = pd.merge(adf, sddf, on='session_id')
len(df)

510199

## Define what we are looking for

In this section we will define the used features that we are looking for in each user.

In [7]:
def placed_vss_element(udf):
    mask1 = udf.newcnt > 0
    mask2 = udf.category == 'VSSElement'
    return 1 if len(udf[mask1 & mask2]) > 0 else 0

In [8]:
def placed_mwo_element(udf):
    mask1 = udf.newcnt > 0
    mask2 = udf.category == 'MWOElement'
    return 1 if len(udf[mask1 & mask2]) > 0 else 0

In [9]:
def ran_linear_simulation(udf):
    sim_idents = ['LinCktSimAWR', 'Default Linear', 'APLAC Linear']
    sim_count = udf[udf.ident.isin(sim_idents)]['count'].sum()
    return 1 if (sim_count > 5) else 0

In [10]:
def ran_vss_simulation(udf):
    sim_idents = ['VSS Time Domain']
    sim_count = udf[udf.ident.isin(sim_idents)]['count'].sum()
    return 1 if (sim_count > 5) else 0

In [11]:
def ran_rfb_simulation(udf):
    sim_idents = ['VSS RF Budget Analysis']
    sim_count = udf[udf.ident.isin(sim_idents)]['count'].sum()
    return 1 if (sim_count > 5) else 0

In [12]:
def ran_rfi_simulation(udf):
    sim_idents = ['VSS RF Inspector']
    sim_count = udf[udf.ident.isin(sim_idents)]['count'].sum()
    return 1 if (sim_count > 5) else 0

### Filter out the simulations that are based on example projects

In [13]:
df_noex = df[df.proj_name.isnull()]

In [14]:
v = 0
m = 0
users = set()
for uid, udf in df_noex.groupby('user_id'):
    rv = placed_vss_element(udf)
    v += rv
    rm = placed_mwo_element(udf)
    m += rm
    if rm or rv:
        users.add(uid)
    
tot_users = len(df_noex.user_id.unique())
with_ele = len(users)

md("""
First, let's look at the number of users that have placed elements in a schematic or
system diagram. This gives us an indication of creation of a new document and eliminates
the people that are just opening documents and running them.

* {} users placed an element in a schematic ({}%)
* {} users placed an element in a system diagram ({}%)

Note there are {} total users and out of those {} has placed an element of either type.
""".format(m, int(round(100*m/tot_users)),
           v, int(round(100*v/tot_users)), tot_users, with_ele))
s['academic']['Pct users placing schematic element'] = int(round(100*m/tot_users))
s['academic']['Pct users placing system element'] = int(round(100*v/tot_users))

In [15]:
v = 0
rfb = 0
rfi = 0
m = 0
for uid, udf in df_noex.groupby('user_id'):
    v += ran_vss_simulation(udf)
    m += ran_linear_simulation(udf)
    rfb += ran_rfb_simulation(udf)
    rfi += ran_rfi_simulation(udf)
md("""
Let's look at the number of users that have run simulations and the types
of simulations they run.

* {} users ran a linear simulation
* {} users ran a system simulation
* {} users ran an rfi simulation
* {} users ran an rfb simulation
""".format(m, v, rfi, rfb))
s['academic']['Users linear sim'] = m
s['academic']['Users system sim'] = v
s['academic']['Users rfi sim'] = rfi
s['academic']['Users rfb sim'] = rfb

## Comparing to Commercial Customers

These numbers seem reasonable but without context it is hard to know if they are good or bad.  Let's look at these same metrics for our commercial customers to get a feel of what these ratios should be.

In [16]:
# get only customers
cdf = sdf[(sdf.user_type == 'customer') & (sdf.app == 'M')]
print('Number of customer sessions:', len(cdf))

# let's keep only those in this academic semester
cdf = cdf[cdf.created_time >= '2018-08-15']
print('Customer sessions this semester', len(cdf))

# remove sessions that have no commands
cdf = cdf[cdf.has_commands]
print('Sessions with commands', len(cdf))
s['commercial']['Sessions with commands'] = len(cdf)

# convert create time from string to datetime
cdf['created_time'] = pd.to_datetime(cdf.created_time)
# add week
cdf['week'] = cdf.created_time.apply(lambda x: (x.year - 2010) * 100 + int(x.strftime('%U')))

# add user-week
cdf['user_week'] = cdf.user_id * 1000 + cdf.week

# cleanup columns
to_drop = ['has_commands', 'user_type', 'custid', 'runtime', 'state', 'auto_proj',
           'guid', 'instid', 'app', 'start_user', 'iu_name', 'created_time']
cdf = cdf.drop(to_drop, axis=1)

# merge sdf and sddf
df = pd.merge(cdf, sddf, on='session_id')
len(df)

Number of customer sessions: 348880
Customer sessions this semester 49911
Sessions with commands 39925


964671

In [17]:
df_noex = df[df.proj_name.isnull()]
len(df_noex)

934250

In [18]:
v = 0
m = 0
users = set()
for uid, udf in df_noex.groupby('user_id'):
    rv = placed_vss_element(udf)
    v += rv
    rm = placed_mwo_element(udf)
    m += rm
    if rm or rv:
        users.add(uid)
    
tot_users = len(df_noex.user_id.unique())
with_ele = len(users)

md("""
First, let's look at the number of users that have placed elements in a schematic or
system diagram. This gives us an indication of creation of a new document and eliminates
the people that are just opening documents and running them.

* {} users placed an element in a schematic ({}%)
* {} users placed an element in a system diagram ({}%)

Note there are {} total users and out of those {} has placed an element of either type.
""".format(m, int(round(100*m/tot_users)),
           v, int(round(100*v/tot_users)), tot_users, with_ele))
s['commercial']['Pct users placing schematic element'] = int(round(100*m/tot_users))
s['commercial']['Pct users placing system element'] = int(round(100*v/tot_users))

In [19]:
v = 0
rfb = 0
rfi = 0
m = 0
for uid, udf in df_noex.groupby('user_id'):
    v += ran_vss_simulation(udf)
    m += ran_linear_simulation(udf)
    rfb += ran_rfb_simulation(udf)
    rfi += ran_rfi_simulation(udf)

s['commercial']['Users linear sim'] = m
s['commercial']['Users system sim'] = v
s['commercial']['Users rfi sim'] = rfi
s['commercial']['Users rfb sim'] = rfb

md("""
Let's look at the number of users that have run simulations and the types
of simulations they run.

* {} users ran a linear simulation
* {} users ran a system simulation
* {} users ran an rfi simulation
* {} users ran an rfb simulation
""".format(m, v, rfi, rfb))


Let's look at the number of users that have run simulations and the types
of simulations they run.

* 890 users ran a linear simulation
* 35 users ran a system simulation
* 16 users ran an rfi simulation
* 42 users ran an rfb simulation


In [20]:
pd.DataFrame(s)

Unnamed: 0,academic,commercial
Pct users placing schematic element,80,74
Pct users placing system element,12,7
Sessions with commands,23591,39925
Users linear sim,906,890
Users rfb sim,58,42
Users rfi sim,15,16
Users system sim,74,35


In [21]:
# 