Use Case
A client has data on users for an application from the past two years. They define an "adopted user" as a user who has logged into the application on three separate days in at least one seven ­day period. They want to understand what variables contribute to a user converting into an adopted user. The assignment is to inspect the data and prepare an analysis that shows non-technical stakeholders what variables and conditions are associated with user adoption.

A user table ("takehome_users") with data on 12,000 users who signed up for the product in the last two years. This table includes:
● name: the user's name
● object_id: the user's id
● email: email address
● creation_source: how their account was created. This takes on one of 5 values:

        ○  PERSONAL_PROJECTS: invited to join another user's personal workspace
        ○  GUEST_INVITE: invited to an organization as a guest (limited permissions)            
        ○  ORG_INVITE: invited to an organization (as a full member)            
        ○  SIGNUP: signed up via the website
        ○  SIGNUP_GOOGLE_AUTH: signed up using Google Authentication (using a Google email account for their login id)                                                                              
● creation_time: when they created their account
● last_session_creation_time: unix timestamp of last login
● opted_in_to_mailing_list: whether they have opted into receiving marketing emails

● enabled_for_marketing_drip: whether they are on the regular marketing email drip
● org_id: the organization (group of users) they belong to
● invited_by_user_id: which user invited them to join (if applicable) A usage summary table ("takehome_user_engagement") that has a row for each day that a user logged into the product. Instructions

Defining an "adopted user" as a user who has logged into the application on three separate days in at least one seven ­day period, identify which factors predict future user adoption. Arriving at an answer may look something like this:

Merge, clean, and organize data as necessary

Define a transformation to evaluate which users are adopted users along with other feature engineering

Conduct exploratory data analysis

If necessary, develop a machine-learning model

Produce a report with findings about the influence of different variables with respect to adopted users. We suggest spending 1­-2 hours on this, but you're welcome to spend more or less. Please send us a brief writeup of your findings (the more concise, the better -­­ no more than one page), along with any summary tables, graphs, code, or queries that can help us understand your approach. Please note any factors you considered or investigations you did, even if they did not pan out. Feel free to identify any further research or data you think would be valuable.

In [69]:
import pandas as pd
import numpy as np
import datetime
from datetime import timedelta

In [87]:
user_en= pd.read_csv("./takehome_user_engagement.csv")
users= pd.read_csv("./takehome_users.csv",encoding='latin-1')

In [88]:

users.invited_by_user_id = users.invited_by_user_id.fillna(0)

users['last_session_creation_time']=users.last_session_creation_time[users.last_session_creation_time.notnull()].apply(lambda x: datetime.datetime.fromtimestamp(x))
users['creation_time']= pd.to_datetime(users.creation_time)
users['inactivity']= (users.last_session_creation_time[users.last_session_creation_time.notnull()]-users.creation_time[users.creation_time.notnull()])


In [89]:

user_en['time_stamp']=pd.to_datetime(df1['time_stamp'])
user_en.drop_duplicates('user_id', keep = 'first')
user_en.shape


(207917, 3)

In [90]:

user_en['visits_7_days'] = user_en.groupby('user_id', as_index=False, group_keys=False).apply(lambda x:x.rolling('7D',on='time_stamp')['user_id'].count())
user_en.head(5)

user_en.drop_duplicates('user_id', keep = 'first')
user_en.shape

adopted_users = user_en[user_en['visits_7_days']>=3]

adopted_users = adopted_users.drop_duplicates('user_id', keep = 'first')

len(adopted_users)
adopted_users_list = adopted_users.user_id.tolist()


In [93]:
# users

users=users[users.object_id.isin(aadopted_users_list)]
len(users)


user_en.drop(['visited'],axis=1, inplace=True)

In [94]:

users_f = users.merge(user_en, how = 'left', left_on = 'object_id', right_on = 'user_id')

In [105]:
users_f

Unnamed: 0,object_id,creation_time,creation_source,last_session_creation_time,opted_in_to_mailing_list,enabled_for_marketing_drip,org_id,invited_by_user_id,time_stamp,user_id,visits_7_days,is_adopted
0,2,1.384467e+09,ORG_INVITE,1.396238e+09,0,0,1,316.0,1.384467e+09,2,1.0,0.0
1,2,1.384467e+09,ORG_INVITE,1.396238e+09,0,0,1,316.0,1.385677e+09,2,1.0,0.0
2,2,1.384467e+09,ORG_INVITE,1.396238e+09,0,0,1,316.0,1.386541e+09,2,1.0,0.0
3,2,1.384467e+09,ORG_INVITE,1.396238e+09,0,0,1,316.0,1.387923e+09,2,1.0,0.0
4,2,1.384467e+09,ORG_INVITE,1.396238e+09,0,0,1,316.0,1.388442e+09,2,2.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...
197913,11988,1.394862e+09,PERSONAL_PROJECTS,1.401621e+09,0,0,114,0.0,1.400823e+09,11988,6.0,1.0
197914,11988,1.394862e+09,PERSONAL_PROJECTS,1.401621e+09,0,0,114,0.0,1.400910e+09,11988,6.0,1.0
197915,11988,1.394862e+09,PERSONAL_PROJECTS,1.401621e+09,0,0,114,0.0,1.401082e+09,11988,5.0,1.0
197916,11988,1.394862e+09,PERSONAL_PROJECTS,1.401621e+09,0,0,114,0.0,1.401169e+09,11988,5.0,1.0


In [95]:
users_f['last_session_creation_time'] = users_f.last_session_creation_time[users_f['last_session_creation_time'].notnull()].apply(lambda x: datetime.datetime.timestamp(x))
users_f['time_stamp'] = users_f.time_stamp[users_f.time_stamp.notnull()].apply(lambda x: datetime.datetime.timestamp(x))
users_f['creation_time'] = users_f.creation_time[users_f.creation_time.notnull()].apply(lambda x: datetime.datetime.timestamp(x))


In [96]:
users_f = users_f.fillna(0)

In [97]:

users_f.drop(['name','email','inactivity'],axis=1, inplace=True)
users_f.head()

Unnamed: 0,object_id,creation_time,creation_source,last_session_creation_time,opted_in_to_mailing_list,enabled_for_marketing_drip,org_id,invited_by_user_id,time_stamp,user_id,visits_7_days
0,2,1384467000.0,ORG_INVITE,1396238000.0,0,0,1,316.0,1384467000.0,2,1.0
1,2,1384467000.0,ORG_INVITE,1396238000.0,0,0,1,316.0,1385677000.0,2,1.0
2,2,1384467000.0,ORG_INVITE,1396238000.0,0,0,1,316.0,1386541000.0,2,1.0
3,2,1384467000.0,ORG_INVITE,1396238000.0,0,0,1,316.0,1387923000.0,2,1.0
4,2,1384467000.0,ORG_INVITE,1396238000.0,0,0,1,316.0,1388442000.0,2,2.0


In [98]:
users_f['is_adopted']=users_f.visits_7_days
users_f['is_adopted'][users_f['is_adopted']>=3].count()

users_f['is_adopted'][users_f['is_adopted']<3]=0
users_f['is_adopted'][users_f['is_adopted']>=3]=1
users_f['is_adopted'][users_f['is_adopted']==1].count()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  users_f['is_adopted'][users_f['is_adopted']<3]=0
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  users_f['is_adopted'][users_f['is_adopted']>=3]=1


160522

In [99]:
users_f

Unnamed: 0,object_id,creation_time,creation_source,last_session_creation_time,opted_in_to_mailing_list,enabled_for_marketing_drip,org_id,invited_by_user_id,time_stamp,user_id,visits_7_days,is_adopted
0,2,1.384467e+09,ORG_INVITE,1.396238e+09,0,0,1,316.0,1.384467e+09,2,1.0,0.0
1,2,1.384467e+09,ORG_INVITE,1.396238e+09,0,0,1,316.0,1.385677e+09,2,1.0,0.0
2,2,1.384467e+09,ORG_INVITE,1.396238e+09,0,0,1,316.0,1.386541e+09,2,1.0,0.0
3,2,1.384467e+09,ORG_INVITE,1.396238e+09,0,0,1,316.0,1.387923e+09,2,1.0,0.0
4,2,1.384467e+09,ORG_INVITE,1.396238e+09,0,0,1,316.0,1.388442e+09,2,2.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...
197913,11988,1.394862e+09,PERSONAL_PROJECTS,1.401621e+09,0,0,114,0.0,1.400823e+09,11988,6.0,1.0
197914,11988,1.394862e+09,PERSONAL_PROJECTS,1.401621e+09,0,0,114,0.0,1.400910e+09,11988,6.0,1.0
197915,11988,1.394862e+09,PERSONAL_PROJECTS,1.401621e+09,0,0,114,0.0,1.401082e+09,11988,5.0,1.0
197916,11988,1.394862e+09,PERSONAL_PROJECTS,1.401621e+09,0,0,114,0.0,1.401169e+09,11988,5.0,1.0


In [106]:
from sklearn.model_selection import train_test_split
# X = users_f.drop(['is_adopted'],axis=1)
# y = users_f.is_adopted
# X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3, random_state=30, stratify=y)# X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3, random_state=30, stratify=y)

In [107]:
from sklearn.preprocessing import LabelEncoder

lb_make = LabelEncoder()

df2["creation_source"] = lb_make.fit_transform(df2["creation_source"])
df2
df2['last_session_creation_time'] = pd.to_datetime(df2['last_session_creation_time'],unit='s')
#
# df2['source_of_channel']=df2['email'].apply(lambda x:x.split("@")[-1].replace(".com",""))
df2_en=df2.merge(df1,on='user_id',how='left')


KeyError: 'user_id'