## Use Case
A client has data on users for an application from the past two years. They define an "adopted user" as a user who has logged into the application on three separate days in at least one seven ­day period. They want to understand what variables contribute to a user converting into an adopted user. The assignment is to inspect the data and prepare an analysis that shows non-technical stakeholders what variables and conditions are associated with user adoption.




A user table ("takehome_users") with data on 12,000 users who signed up for the product in the last two years. https://gist.github.com/raslan066 This table includes:                                                                      
●  name: the user's name     
●  object_id: the user's id                                                     
●  email: email address                                             
●  creation_source: how their account was created. This takes on one of 5 values:
            
            ○  PERSONAL_PROJECTS: invited to join another user's personal workspace
            ○  GUEST_INVITE: invited to an organization as a guest (limited permissions)            
            ○  ORG_INVITE: invited to an organization (as a full member)            
            ○  SIGNUP: signed up via the website
            ○  SIGNUP_GOOGLE_AUTH: signed up using Google Authentication (using a Google email account for their login id)                                                                              
●  creation_time: when they created their account                                                             
●  last_session_creation_time: unix timestamp of last login                                             
●  opted_in_to_mailing_list: whether they have opted into receiving marketing emails

●  enabled_for_marketing_drip: whether they are on the regular marketing email drip       
●  org_id: the organization (group of users) they belong to                                     
●  invited_by_user_id: which user invited them to join (if applicable)
A usage summary table ("takehome_user_engagement") that has a row for each day that a user logged into the product.
Instructions

Defining an "adopted user" as a user who has logged into the application on three separate days in at least one seven ­day period, identify which factors predict future user adoption. Arriving at an answer may look something like this:

Merge, clean, and organize data as necessary

Define a transformation to evaluate which users are adopted users along with other feature engineering

Conduct exploratory data analysis

If necessary, develop a machine-learning model

Produce a report with findings about the influence of different variables with respect to adopted users.
We suggest spending 1­-2 hours on this, but you're welcome to spend more or less. Please send us a brief writeup of your findings (the more concise, the better -­­ no more than one page), along with any summary tables, graphs, code, or queries that can help us understand your approach. Please note any factors you considered or investigations you did, even if they did not pan out. Feel free to identify any further research or data you think would be valuable. 



In [154]:
import pandas as pd
import numpy as np 

In [394]:

df1=pd.read_csv(r"./takehome_user_engagement.csv")
df2=pd.read_csv(r"./takehome_users.csv",encoding='latin-1')
df1['date']=pd.to_datetime(df1['time_stamp']).dt.strftime('%d-%m-%Y')
df1=df1.set_index("time_stamp")
display(df1)

Unnamed: 0_level_0,user_id,visited,date
time_stamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2014-04-22 03:53:30,1,1,22-04-2014
2013-11-15 03:45:04,2,1,15-11-2013
2013-11-29 03:45:04,2,1,29-11-2013
2013-12-09 03:45:04,2,1,09-12-2013
2013-12-25 03:45:04,2,1,25-12-2013
...,...,...,...
2013-09-06 06:14:15,11996,1,06-09-2013
2013-01-15 18:28:37,11997,1,15-01-2013
2014-04-27 12:45:16,11998,1,27-04-2014
2012-06-02 11:55:59,11999,1,02-06-2012


In [395]:
try:
    df1=df1.reset_index()
    
    df1['time_stamp']=pd.to_datetime(df1['time_stamp'])
    df1=df1.set_index("time_stamp")
except Exception as e:
    print("not process",e)
    pass 


In [396]:
df1

Unnamed: 0_level_0,user_id,visited,date
time_stamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2014-04-22 03:53:30,1,1,22-04-2014
2013-11-15 03:45:04,2,1,15-11-2013
2013-11-29 03:45:04,2,1,29-11-2013
2013-12-09 03:45:04,2,1,09-12-2013
2013-12-25 03:45:04,2,1,25-12-2013
...,...,...,...
2013-09-06 06:14:15,11996,1,06-09-2013
2013-01-15 18:28:37,11997,1,15-01-2013
2014-04-27 12:45:16,11998,1,27-04-2014
2012-06-02 11:55:59,11999,1,02-06-2012


In [397]:

df1['year']=df1.index.year
df1['month']=df1.index.month
df1['week']=df1.index.isocalendar().week
df1['day']=df1.index.day
df1['day_name']=df1.index.day_name()


In [386]:
df1


Unnamed: 0,level_0,index,user_id,visited,date,year,month,week,day,day_name
0,0,0,1,1,22-04-2014,2014,4,17,22,Tuesday
1,1,1,2,1,15-11-2013,2013,11,46,15,Friday
2,2,2,2,1,29-11-2013,2013,11,48,29,Friday
3,3,3,2,1,09-12-2013,2013,12,50,9,Monday
4,4,4,2,1,25-12-2013,2013,12,52,25,Wednesday
...,...,...,...,...,...,...,...,...,...,...
207912,207912,207912,11996,1,06-09-2013,2013,9,36,6,Friday
207913,207913,207913,11997,1,15-01-2013,2013,1,3,15,Tuesday
207914,207914,207914,11998,1,27-04-2014,2014,4,17,27,Sunday
207915,207915,207915,11999,1,02-06-2012,2012,6,22,2,Saturday


In [354]:
### as we require distinct day 

## lets see if any 
# df1[df1.duplicated()] ### as it returns null no distinct day with same id is present  

# filter by date and id if present  
# np.unique(df1[["user_id",'time_stamp']]) ## dates cannot possible with numpy 
# df1[["user_id",'time_stamp']]=df1[["user_id",'time_stamp']].drop_duplicates()

## 
if week is same for same userid and visited >3  


In [385]:
try:
    df1=df1.reset_index()
#     df1=df1.drop(columns=['time_stamp'])
    df1[df1[['user_id']].duplicated()]     ## it returning Null as we can see no duplicates are found 
except Exception as e:
    print(e)
    pass 



In [382]:
# f=df1.groupby(['user_id','date']).agg(sum)
# f

In [391]:
df1=df1.reset_index()

In [405]:
f=df1.groupby(['user_id','year','week']).sum()
f=f.reset_index()

In [406]:
display(f.head(2),t.head(2))
# df1=df1.drop(columns=['visited'])


Unnamed: 0,user_id,year,week,visited,date,month,day,day_name
0,1,2014,17,1,22-04-2014,4,22,Tuesday
1,2,2013,1,1,31-12-2013,12,31,Tuesday


Unnamed: 0,visited,day,day_name
0,1,22,Tuesday
1,1,15,Friday


In [407]:
# f.reset_index().value_counts()
f[f['visited']>3]

Unnamed: 0,user_id,year,week,visited,date,month,day,day_name
33,10,2013,18,4,30-04-201301-05-201302-05-201303-05-2013,19,36,TuesdayWednesdayThursdayFriday
34,10,2013,19,5,06-05-201307-05-201308-05-201310-05-201312-05-...,25,43,MondayTuesdayWednesdayFridaySunday
38,10,2013,23,4,03-06-201304-06-201307-06-201309-06-2013,24,23,MondayTuesdayFridaySunday
39,10,2013,24,5,10-06-201311-06-201313-06-201315-06-201316-06-...,30,65,MondayTuesdayThursdaySaturdaySunday
42,10,2013,27,4,03-07-201304-07-201305-07-201306-07-2013,28,18,WednesdayThursdayFridaySaturday
...,...,...,...,...,...,...,...,...
69232,11975,2014,19,7,05-05-201406-05-201407-05-201408-05-201409-05-...,35,56,MondayTuesdayWednesdayThursdayFridaySaturdaySu...
69233,11975,2014,20,7,12-05-201413-05-201414-05-201415-05-201416-05-...,35,105,MondayTuesdayWednesdayThursdayFridaySaturdaySu...
69252,11988,2014,12,4,17-03-201418-03-201419-03-201422-03-2014,12,76,MondayTuesdayWednesdaySaturday
69258,11988,2014,18,4,28-04-201429-04-201430-04-201404-05-2014,17,91,MondayTuesdayWednesdaySunday
