# <center> Relax Challenge


The  data  is  available  as  two  attached  CSV  files:

*takehome_user_engagement. csv*

*takehome_users . csv*

The  data  has  the  following  two  tables:

1 - A  user  table  ( "takehome_users" )  with  data  on  12,000  users  who  signed  up  for  the
product  in  the  last  two  years.   This  table  includes:

-  **name**:  the  user's  name
-  **object_id**:   the  user's  id
-  **email**:  email  address
-  **creation_source**:   how  their  account  was  created.  This  takes  on  one of  5  values:
    - **PERSONAL_PROJECTS**:  invited  to  join  another  user's personal  workspace
    - **GUEST_INVITE**:  invited  to  an  organization  as  a  guest (limited  permissions)
    - **ORG_INVITE**:  invited  to  an  organization  (as  a  full  member)
    - **SIGNUP**:  signed  up  via  the  website
    - **SIGNUP_GOOGLE_AUTH**:  signed  up  using  Google Authentication  (using  a  Google  email  account  for  their  login id)
-  **creation_time**:  when  they  created  their  account
-  **last_session_creation_time**:   unix  timestamp  of  last  login
-  **opted_in_to_mailing_list**:  whether  they  have  opted  into  receiving marketing  emails
-  **enabled_for_marketing_drip**:  whether  they  are  on  the  regular marketing  email  drip
-  **org_id**:   the  organization  (group  of  users)  they  belong  to
-  **invited_by_user_id**:   which  user  invited  them  to  join  (if  applicable).

2 - A  usage  summary  table  ( "takehome_user_engagement" )  that  has  a  row  for  each  day
that  a  user  logged  into  the  product.

Defining  an  "adopted  user"   as  a  user  who   has  logged  into  the  product  on  three  separate
days  in  at  least  one  seven-day  period ,  identify  which  factors  predict  future  user
adoption.

We  suggest  spending  1-2  hours  on  this,  but  you're  welcome  to  spend  more  or  less.
Please  send  us  a  brief  writeup  of  your  findings  (the  more  concise,  the  better  --  no  more
than  one  page),  along  with  any  summary  tables,  graphs,  code,  or  queries  that  can  help
us  understand  your  approach.  Please  note  any  factors  you  considered  or  investigation
you  did,  even  if  they  did  not  pan  out.  Feel  free  to  identify  any  further  research  or  data
you  think  would  be  valuable.

### 1.0 Import Libraries and load the data

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
pd.options.display.max_columns = 500

%matplotlib inline

In [2]:
# Load the two files into dataframes and see how the data is structured and organized:
df_users = pd.read_csv('takehome_users.csv', encoding='latin-1', index_col=0)
df_users.head()

Unnamed: 0_level_0,creation_time,name,email,creation_source,last_session_creation_time,opted_in_to_mailing_list,enabled_for_marketing_drip,org_id,invited_by_user_id
object_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,2014-04-22 03:53:30,Clausen August,AugustCClausen@yahoo.com,GUEST_INVITE,1398139000.0,1,0,11,10803.0
2,2013-11-15 03:45:04,Poole Matthew,MatthewPoole@gustr.com,ORG_INVITE,1396238000.0,0,0,1,316.0
3,2013-03-19 23:14:52,Bottrill Mitchell,MitchellBottrill@gustr.com,ORG_INVITE,1363735000.0,0,0,94,1525.0
4,2013-05-21 08:09:28,Clausen Nicklas,NicklasSClausen@yahoo.com,GUEST_INVITE,1369210000.0,0,0,1,5151.0
5,2013-01-17 10:14:20,Raw Grace,GraceRaw@yahoo.com,GUEST_INVITE,1358850000.0,0,0,193,5240.0


In [3]:
df_engage = pd.read_csv('takehome_user_engagement.csv')
df_engage.head()

Unnamed: 0,time_stamp,user_id,visited
0,2014-04-22 03:53:30,1,1
1,2013-11-15 03:45:04,2,1
2,2013-11-29 03:45:04,2,1
3,2013-12-09 03:45:04,2,1
4,2013-12-25 03:45:04,2,1


### 1.1 Building the "Adopted User" Label

Initially, I'll construct the "adopted user" label from the user engagement dataset, which I've loaded into the df_engage data frame as illustrated above. Let's verify that there's no missing data in this dataset:

In [4]:
# checking for any missing data
df_engage.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 207917 entries, 0 to 207916
Data columns (total 3 columns):
 #   Column      Non-Null Count   Dtype 
---  ------      --------------   ----- 
 0   time_stamp  207917 non-null  object
 1   user_id     207917 non-null  int64 
 2   visited     207917 non-null  int64 
dtypes: int64(2), object(1)
memory usage: 4.8+ MB


We're in possession of a dataset containing 207,917 rows of system logins, all complete with no missing values. Although the 'time_stamp' column supplies the date and time for each login, it was initially loaded as strings into the dataframe. My subsequent step involves converting this data into DateTime format and assigning it as the index for the dataframe:

In [5]:
# convert from string to datetime format
df_engage['time_stamp'] = pd.to_datetime(df_engage['time_stamp'])

In [6]:
# set the time_stamp column as the dataframe index
df_engage = df_engage.set_index('time_stamp')
df_engage.head()

Unnamed: 0_level_0,user_id,visited
time_stamp,Unnamed: 1_level_1,Unnamed: 2_level_1
2014-04-22 03:53:30,1,1
2013-11-15 03:45:04,2,1
2013-11-29 03:45:04,2,1
2013-12-09 03:45:04,2,1
2013-12-25 03:45:04,2,1


With the DateTime index established for the dataframe, I can employ the .resample() method to aggregate the dates into weekly intervals. Although this method might not precisely capture every 7-day period in which a user adopted the product, it should encompass the majority of users who engage three times a week. Hence, it should suffice for this preliminary analysis. Initially, I'll group the data by 'user_id', then resample the dates, and sum the data, thereby utilizing the 'visited' column to identify adopted users:

In [7]:
# group data by user_id, resample into weekly dates, and sum the visited column
df_engage = df_engage.groupby('user_id').resample('1W').sum()
df_engage.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,user_id,visited
user_id,time_stamp,Unnamed: 2_level_1,Unnamed: 3_level_1
1,2014-04-27,1,1
2,2013-11-17,2,1
2,2013-11-24,0,0
2,2013-12-01,2,1
2,2013-12-08,0,0


 Next, isolate the particular users that have logged into the product at least three times in a week:

In [8]:
# create dataframe of only the adopted users
df_adopted = df_engage[df_engage['visited'] > 2]
del df_adopted['user_id']
df_adopted.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,visited
user_id,time_stamp,Unnamed: 2_level_1
2,2014-02-09,3
10,2013-03-03,3
10,2013-04-14,3
10,2013-04-28,3
10,2013-05-05,4


We've accumulated 33,859 weekly data points reflecting over 3 logins to the product within a week. However, it's important to note that not all of these data points correspond to unique users. Consequently, I'll compile a list of adopted user IDs to serve as the basis for creating labels in the other dataset:

In [9]:
# Create a list of adopted user IDs from the DataFrame index
adopted_index = list(df_adopted.index)
adopted_users_list = []
for i in range(len(adopted_index)):
    adopted_users_list.append(adopted_index[i][0])
len(adopted_users_list)

33859

In [10]:
# Convert the list of adopted user IDs into a set to remove duplicates
adopted_users = set(adopted_users_list)
len(adopted_users)

1445

We've determined that 1,445 out of 12,000 users have adopted the product, constituting a modest 12% of the total users. Now, let's incorporate the adopted label into the df_users dataframe:

In [11]:
# Add an 'adopted' column to the df_users dataframe, initially setting all values to False
df_users['adopted'] = False

# Update the 'adopted' column to True for users in the adopted_users set
for user in adopted_users:
    df_users.loc[user, 'adopted'] = True

# Display the first 5 rows of the updated df_users dataframe
df_users.head()

Unnamed: 0_level_0,creation_time,name,email,creation_source,last_session_creation_time,opted_in_to_mailing_list,enabled_for_marketing_drip,org_id,invited_by_user_id,adopted
object_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1,2014-04-22 03:53:30,Clausen August,AugustCClausen@yahoo.com,GUEST_INVITE,1398139000.0,1,0,11,10803.0,False
2,2013-11-15 03:45:04,Poole Matthew,MatthewPoole@gustr.com,ORG_INVITE,1396238000.0,0,0,1,316.0,True
3,2013-03-19 23:14:52,Bottrill Mitchell,MitchellBottrill@gustr.com,ORG_INVITE,1363735000.0,0,0,94,1525.0,False
4,2013-05-21 08:09:28,Clausen Nicklas,NicklasSClausen@yahoo.com,GUEST_INVITE,1369210000.0,0,0,1,5151.0,False
5,2013-01-17 10:14:20,Raw Grace,GraceRaw@yahoo.com,GUEST_INVITE,1358850000.0,0,0,193,5240.0,False


### 2.0 Exploratory Data Analysis

In [12]:
# Display a summary of the df_users dataframe, including the data types and non-null counts for each column
df_users.info()

<class 'pandas.core.frame.DataFrame'>
Index: 12000 entries, 1 to 12000
Data columns (total 10 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   creation_time               12000 non-null  object 
 1   name                        12000 non-null  object 
 2   email                       12000 non-null  object 
 3   creation_source             12000 non-null  object 
 4   last_session_creation_time  8823 non-null   float64
 5   opted_in_to_mailing_list    12000 non-null  int64  
 6   enabled_for_marketing_drip  12000 non-null  int64  
 7   org_id                      12000 non-null  int64  
 8   invited_by_user_id          6417 non-null   float64
 9   adopted                     12000 non-null  bool   
dtypes: bool(1), float64(2), int64(3), object(4)
memory usage: 1.2+ MB


It appears that two columns have missing data values. The 'last_session_creation_time' column is missing data for roughly one-third of the users. I have decided not to use this feature in our model. This decision is based on the likelihood that this feature is correlated with our adopted user label (adopted users are likely to have more recent last sessions). Including this feature could limit the significance of our model results. In other words, it wouldn’t be very insightful to build a model that simply tells us that users who logged in more recently are more likely to have adopted the product.

Next, let's investigate the missing values in the 'invited_by_user_id' feature.

In [13]:
# Display the count of each value in the 'invited_by_user_id' column, including NaN values, sorted in descending order
df_users['invited_by_user_id'].value_counts(ascending=False, dropna=False)

invited_by_user_id
NaN        5583
10741.0      13
2527.0       12
1525.0       11
2308.0       11
           ... 
2071.0        1
1390.0        1
5445.0        1
8526.0        1
5450.0        1
Name: count, Length: 2565, dtype: int64

The __'invited_by_user_id'__ feature indicates the specific user_id that invited the user or provides a NaN value if the user was not invited by another user. I will convert this feature into a boolean value (indicating whether the user was invited by another existing user) for use in the model.

In [14]:
# Create a new column 'invited_by_user' indicating whether the 'invited_by_user_id' is not null
df_users['invited_by_user'] = df_users['invited_by_user_id'].notnull()

# Remove the 'invited_by_user_id' column from the dataframe
del df_users['invited_by_user_id']

# Display the first few rows of the updated df_users dataframe
df_users.head()

Unnamed: 0_level_0,creation_time,name,email,creation_source,last_session_creation_time,opted_in_to_mailing_list,enabled_for_marketing_drip,org_id,adopted,invited_by_user
object_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1,2014-04-22 03:53:30,Clausen August,AugustCClausen@yahoo.com,GUEST_INVITE,1398139000.0,1,0,11,False,True
2,2013-11-15 03:45:04,Poole Matthew,MatthewPoole@gustr.com,ORG_INVITE,1396238000.0,0,0,1,True,True
3,2013-03-19 23:14:52,Bottrill Mitchell,MitchellBottrill@gustr.com,ORG_INVITE,1363735000.0,0,0,94,False,True
4,2013-05-21 08:09:28,Clausen Nicklas,NicklasSClausen@yahoo.com,GUEST_INVITE,1369210000.0,0,0,1,False,True
5,2013-01-17 10:14:20,Raw Grace,GraceRaw@yahoo.com,GUEST_INVITE,1358850000.0,0,0,193,False,True


### 2.1 Email addresses

The dataset also includes individual email addresses for each user. While the specific email addresses are not likely to be useful for the model, identifying the most common email domains among the product users could be beneficial. Let’s extract this information.

In [15]:
# Split the 'email' column at '@' and create a new column 'email_split' to store the result
df_users['email_split'] = df_users['email'].str.split('@')

# Remove the 'email' column from the dataframe
del df_users['email']

# Extract the email domain from 'email_split' and create a new column 'email_domain'
for user in range(len(df_users)):
    df_users.loc[user+1, 'email_domain'] = df_users.loc[user+1, 'email_split'][1]

# Remove the 'email_split' column from the dataframe
del df_users['email_split']

# Display the top 10 most common email domains
df_users['email_domain'].value_counts().head(10)

email_domain
gmail.com         3562
yahoo.com         2447
jourrapide.com    1259
cuvox.de          1202
gustr.com         1179
hotmail.com       1165
rerwl.com            2
oqpze.com            2
qgjbc.com            2
dqwln.com            2
Name: count, dtype: int64

There appear to be six common email domains among the users. To use this data as a categorical feature in the model, we'll group the less common domains into a single 'Other' category.

In [16]:
# Initialize the 'email' column with the value 'other'
df_users['email'] = 'other'

# Update the 'email' column with specific email domains for matching users
for user in range(len(df_users)):
    if df_users.loc[user+1, 'email_domain'] == 'gmail.com':
        df_users.loc[user+1, 'email'] = 'gmail.com'
    if df_users.loc[user+1, 'email_domain'] == 'yahoo.com':
        df_users.loc[user+1, 'email'] = 'yahoo.com'
    if df_users.loc[user+1, 'email_domain'] == 'jourrapide.com':
        df_users.loc[user+1, 'email'] = 'jourrapide.com'
    if df_users.loc[user+1, 'email_domain'] == 'cuvox.de':
        df_users.loc[user+1, 'email'] = 'cuvox.de'
    if df_users.loc[user+1, 'email_domain'] == 'gustr.com':
        df_users.loc[user+1, 'email'] = 'gustr.com'
    if df_users.loc[user+1, 'email_domain'] == 'hotmail.com':
        df_users.loc[user+1, 'email'] = 'hotmail.com'

# Remove the 'email_domain' column from the dataframe
del df_users['email_domain']

# Display the count of each value in the 'email' column, sorted in descending order
df_users['email'].value_counts(ascending=False)

email
gmail.com         3562
yahoo.com         2447
jourrapide.com    1259
cuvox.de          1202
other             1186
gustr.com         1179
hotmail.com       1165
Name: count, dtype: int64

In [17]:
# Display the first few rows of the updated df_users dataframe
df_users.head()

Unnamed: 0_level_0,creation_time,name,creation_source,last_session_creation_time,opted_in_to_mailing_list,enabled_for_marketing_drip,org_id,adopted,invited_by_user,email
object_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1,2014-04-22 03:53:30,Clausen August,GUEST_INVITE,1398139000.0,1,0,11,False,True,yahoo.com
2,2013-11-15 03:45:04,Poole Matthew,ORG_INVITE,1396238000.0,0,0,1,True,True,gustr.com
3,2013-03-19 23:14:52,Bottrill Mitchell,ORG_INVITE,1363735000.0,0,0,94,False,True,gustr.com
4,2013-05-21 08:09:28,Clausen Nicklas,GUEST_INVITE,1369210000.0,0,0,1,False,True,yahoo.com
5,2013-01-17 10:14:20,Raw Grace,GUEST_INVITE,1358850000.0,0,0,193,False,True,yahoo.com


### 3.0 Pre-processing

Prepare the data for building a predictive model. To start, I will create a new dataframe called df_ml. This dataframe will have our label, the 'adopted user' feature, as the first column, and the 'user_id' as the index:

In [18]:
# Create a new DataFrame 'df_ml' with 'adopted_user' column and 'user_id' as index
df_ml = pd.DataFrame({'adopted_user': df_users['adopted'].values}, index=df_users.index)

# Display the df_ml DataFrame
df_ml

Unnamed: 0_level_0,adopted_user
object_id,Unnamed: 1_level_1
1,False
2,True
3,False
4,False
5,False
...,...
11996,False
11997,False
11998,False
11999,False


### 3.1 Binary Features

There are a number of binary categorical variables that simply need to be added to the dataframe in their current format:

In [19]:
# Add columns 'mailing_list', 'marketing_drip', and 'invited_by_user' to df_ml
df_ml['mailing_list'] = df_users['opted_in_to_mailing_list']
df_ml['marketing_drip'] = df_users['enabled_for_marketing_drip']
df_ml['invited_by_user'] = df_users['invited_by_user']

# Display the updated df_ml DataFrame
df_ml

Unnamed: 0_level_0,adopted_user,mailing_list,marketing_drip,invited_by_user
object_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,False,1,0,True
2,True,0,0,True
3,False,0,0,True
4,False,0,0,True
5,False,0,0,True
...,...,...,...,...
11996,False,0,0,True
11997,False,0,0,False
11998,False,1,1,True
11999,False,0,0,False


### 3.2 Categorical Variables

The dataset also has a number of categorical variables. I will convert these into dummy variables and add them to the df_ml dataframe:

In [20]:
# Convert 'org_id' column to string type
df_users['org_id'] = df_users['org_id'].astype('str')

# Create dummy variables for categorical columns 'creation_source', 'org_id', and 'email'
df_dummies = pd.get_dummies(df_users[['creation_source', 'org_id', 'email']])

# Concatenate df_ml with df_dummies
df_ml = pd.concat([df_ml, df_dummies], axis=1)

# Display the updated df_ml DataFrame
df_ml.head()

Unnamed: 0_level_0,adopted_user,mailing_list,marketing_drip,invited_by_user,creation_source_GUEST_INVITE,creation_source_ORG_INVITE,creation_source_PERSONAL_PROJECTS,creation_source_SIGNUP,creation_source_SIGNUP_GOOGLE_AUTH,org_id_0,org_id_1,org_id_10,org_id_100,org_id_101,org_id_102,org_id_103,org_id_104,org_id_105,org_id_106,org_id_107,org_id_108,org_id_109,org_id_11,org_id_110,org_id_111,org_id_112,org_id_113,org_id_114,org_id_115,org_id_116,org_id_117,org_id_118,org_id_119,org_id_12,org_id_120,org_id_121,org_id_122,org_id_123,org_id_124,org_id_125,org_id_126,org_id_127,org_id_128,org_id_129,org_id_13,org_id_130,org_id_131,org_id_132,org_id_133,org_id_134,org_id_135,org_id_136,org_id_137,org_id_138,org_id_139,org_id_14,org_id_140,org_id_141,org_id_142,org_id_143,org_id_144,org_id_145,org_id_146,org_id_147,org_id_148,org_id_149,org_id_15,org_id_150,org_id_151,org_id_152,org_id_153,org_id_154,org_id_155,org_id_156,org_id_157,org_id_158,org_id_159,org_id_16,org_id_160,org_id_161,org_id_162,org_id_163,org_id_164,org_id_165,org_id_166,org_id_167,org_id_168,org_id_169,org_id_17,org_id_170,org_id_171,org_id_172,org_id_173,org_id_174,org_id_175,org_id_176,org_id_177,org_id_178,org_id_179,org_id_18,org_id_180,org_id_181,org_id_182,org_id_183,org_id_184,org_id_185,org_id_186,org_id_187,org_id_188,org_id_189,org_id_19,org_id_190,org_id_191,org_id_192,org_id_193,org_id_194,org_id_195,org_id_196,org_id_197,org_id_198,org_id_199,org_id_2,org_id_20,org_id_200,org_id_201,org_id_202,org_id_203,org_id_204,org_id_205,org_id_206,org_id_207,org_id_208,org_id_209,org_id_21,org_id_210,org_id_211,org_id_212,org_id_213,org_id_214,org_id_215,org_id_216,org_id_217,org_id_218,org_id_219,org_id_22,org_id_220,org_id_221,org_id_222,org_id_223,org_id_224,org_id_225,org_id_226,org_id_227,org_id_228,org_id_229,org_id_23,org_id_230,org_id_231,org_id_232,org_id_233,org_id_234,org_id_235,org_id_236,org_id_237,org_id_238,org_id_239,org_id_24,org_id_240,org_id_241,org_id_242,org_id_243,org_id_244,org_id_245,org_id_246,org_id_247,org_id_248,org_id_249,org_id_25,org_id_250,org_id_251,org_id_252,org_id_253,org_id_254,org_id_255,org_id_256,org_id_257,org_id_258,org_id_259,org_id_26,org_id_260,org_id_261,org_id_262,org_id_263,org_id_264,org_id_265,org_id_266,org_id_267,org_id_268,org_id_269,org_id_27,org_id_270,org_id_271,org_id_272,org_id_273,org_id_274,org_id_275,org_id_276,org_id_277,org_id_278,org_id_279,org_id_28,org_id_280,org_id_281,org_id_282,org_id_283,org_id_284,org_id_285,org_id_286,org_id_287,org_id_288,org_id_289,org_id_29,org_id_290,org_id_291,org_id_292,org_id_293,org_id_294,org_id_295,org_id_296,org_id_297,org_id_298,org_id_299,org_id_3,org_id_30,org_id_300,org_id_301,org_id_302,org_id_303,org_id_304,org_id_305,org_id_306,org_id_307,org_id_308,org_id_309,org_id_31,org_id_310,org_id_311,org_id_312,org_id_313,org_id_314,org_id_315,org_id_316,org_id_317,org_id_318,org_id_319,org_id_32,org_id_320,org_id_321,org_id_322,org_id_323,org_id_324,org_id_325,org_id_326,org_id_327,org_id_328,org_id_329,org_id_33,org_id_330,org_id_331,org_id_332,org_id_333,org_id_334,org_id_335,org_id_336,org_id_337,org_id_338,org_id_339,org_id_34,org_id_340,org_id_341,org_id_342,org_id_343,org_id_344,org_id_345,org_id_346,org_id_347,org_id_348,org_id_349,org_id_35,org_id_350,org_id_351,org_id_352,org_id_353,org_id_354,org_id_355,org_id_356,org_id_357,org_id_358,org_id_359,org_id_36,org_id_360,org_id_361,org_id_362,org_id_363,org_id_364,org_id_365,org_id_366,org_id_367,org_id_368,org_id_369,org_id_37,org_id_370,org_id_371,org_id_372,org_id_373,org_id_374,org_id_375,org_id_376,org_id_377,org_id_378,org_id_379,org_id_38,org_id_380,org_id_381,org_id_382,org_id_383,org_id_384,org_id_385,org_id_386,org_id_387,org_id_388,org_id_389,org_id_39,org_id_390,org_id_391,org_id_392,org_id_393,org_id_394,org_id_395,org_id_396,org_id_397,org_id_398,org_id_399,org_id_4,org_id_40,org_id_400,org_id_401,org_id_402,org_id_403,org_id_404,org_id_405,org_id_406,org_id_407,org_id_408,org_id_409,org_id_41,org_id_410,org_id_411,org_id_412,org_id_413,org_id_414,org_id_415,org_id_416,org_id_42,org_id_43,org_id_44,org_id_45,org_id_46,org_id_47,org_id_48,org_id_49,org_id_5,org_id_50,org_id_51,org_id_52,org_id_53,org_id_54,org_id_55,org_id_56,org_id_57,org_id_58,org_id_59,org_id_6,org_id_60,org_id_61,org_id_62,org_id_63,org_id_64,org_id_65,org_id_66,org_id_67,org_id_68,org_id_69,org_id_7,org_id_70,org_id_71,org_id_72,org_id_73,org_id_74,org_id_75,org_id_76,org_id_77,org_id_78,org_id_79,org_id_8,org_id_80,org_id_81,org_id_82,org_id_83,org_id_84,org_id_85,org_id_86,org_id_87,org_id_88,org_id_89,org_id_9,org_id_90,org_id_91,org_id_92,org_id_93,org_id_94,org_id_95,org_id_96,org_id_97,org_id_98,org_id_99,email_cuvox.de,email_gmail.com,email_gustr.com,email_hotmail.com,email_jourrapide.com,email_other,email_yahoo.com
object_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1,Unnamed: 138_level_1,Unnamed: 139_level_1,Unnamed: 140_level_1,Unnamed: 141_level_1,Unnamed: 142_level_1,Unnamed: 143_level_1,Unnamed: 144_level_1,Unnamed: 145_level_1,Unnamed: 146_level_1,Unnamed: 147_level_1,Unnamed: 148_level_1,Unnamed: 149_level_1,Unnamed: 150_level_1,Unnamed: 151_level_1,Unnamed: 152_level_1,Unnamed: 153_level_1,Unnamed: 154_level_1,Unnamed: 155_level_1,Unnamed: 156_level_1,Unnamed: 157_level_1,Unnamed: 158_level_1,Unnamed: 159_level_1,Unnamed: 160_level_1,Unnamed: 161_level_1,Unnamed: 162_level_1,Unnamed: 163_level_1,Unnamed: 164_level_1,Unnamed: 165_level_1,Unnamed: 166_level_1,Unnamed: 167_level_1,Unnamed: 168_level_1,Unnamed: 169_level_1,Unnamed: 170_level_1,Unnamed: 171_level_1,Unnamed: 172_level_1,Unnamed: 173_level_1,Unnamed: 174_level_1,Unnamed: 175_level_1,Unnamed: 176_level_1,Unnamed: 177_level_1,Unnamed: 178_level_1,Unnamed: 179_level_1,Unnamed: 180_level_1,Unnamed: 181_level_1,Unnamed: 182_level_1,Unnamed: 183_level_1,Unnamed: 184_level_1,Unnamed: 185_level_1,Unnamed: 186_level_1,Unnamed: 187_level_1,Unnamed: 188_level_1,Unnamed: 189_level_1,Unnamed: 190_level_1,Unnamed: 191_level_1,Unnamed: 192_level_1,Unnamed: 193_level_1,Unnamed: 194_level_1,Unnamed: 195_level_1,Unnamed: 196_level_1,Unnamed: 197_level_1,Unnamed: 198_level_1,Unnamed: 199_level_1,Unnamed: 200_level_1,Unnamed: 201_level_1,Unnamed: 202_level_1,Unnamed: 203_level_1,Unnamed: 204_level_1,Unnamed: 205_level_1,Unnamed: 206_level_1,Unnamed: 207_level_1,Unnamed: 208_level_1,Unnamed: 209_level_1,Unnamed: 210_level_1,Unnamed: 211_level_1,Unnamed: 212_level_1,Unnamed: 213_level_1,Unnamed: 214_level_1,Unnamed: 215_level_1,Unnamed: 216_level_1,Unnamed: 217_level_1,Unnamed: 218_level_1,Unnamed: 219_level_1,Unnamed: 220_level_1,Unnamed: 221_level_1,Unnamed: 222_level_1,Unnamed: 223_level_1,Unnamed: 224_level_1,Unnamed: 225_level_1,Unnamed: 226_level_1,Unnamed: 227_level_1,Unnamed: 228_level_1,Unnamed: 229_level_1,Unnamed: 230_level_1,Unnamed: 231_level_1,Unnamed: 232_level_1,Unnamed: 233_level_1,Unnamed: 234_level_1,Unnamed: 235_level_1,Unnamed: 236_level_1,Unnamed: 237_level_1,Unnamed: 238_level_1,Unnamed: 239_level_1,Unnamed: 240_level_1,Unnamed: 241_level_1,Unnamed: 242_level_1,Unnamed: 243_level_1,Unnamed: 244_level_1,Unnamed: 245_level_1,Unnamed: 246_level_1,Unnamed: 247_level_1,Unnamed: 248_level_1,Unnamed: 249_level_1,Unnamed: 250_level_1,Unnamed: 251_level_1,Unnamed: 252_level_1,Unnamed: 253_level_1,Unnamed: 254_level_1,Unnamed: 255_level_1,Unnamed: 256_level_1,Unnamed: 257_level_1,Unnamed: 258_level_1,Unnamed: 259_level_1,Unnamed: 260_level_1,Unnamed: 261_level_1,Unnamed: 262_level_1,Unnamed: 263_level_1,Unnamed: 264_level_1,Unnamed: 265_level_1,Unnamed: 266_level_1,Unnamed: 267_level_1,Unnamed: 268_level_1,Unnamed: 269_level_1,Unnamed: 270_level_1,Unnamed: 271_level_1,Unnamed: 272_level_1,Unnamed: 273_level_1,Unnamed: 274_level_1,Unnamed: 275_level_1,Unnamed: 276_level_1,Unnamed: 277_level_1,Unnamed: 278_level_1,Unnamed: 279_level_1,Unnamed: 280_level_1,Unnamed: 281_level_1,Unnamed: 282_level_1,Unnamed: 283_level_1,Unnamed: 284_level_1,Unnamed: 285_level_1,Unnamed: 286_level_1,Unnamed: 287_level_1,Unnamed: 288_level_1,Unnamed: 289_level_1,Unnamed: 290_level_1,Unnamed: 291_level_1,Unnamed: 292_level_1,Unnamed: 293_level_1,Unnamed: 294_level_1,Unnamed: 295_level_1,Unnamed: 296_level_1,Unnamed: 297_level_1,Unnamed: 298_level_1,Unnamed: 299_level_1,Unnamed: 300_level_1,Unnamed: 301_level_1,Unnamed: 302_level_1,Unnamed: 303_level_1,Unnamed: 304_level_1,Unnamed: 305_level_1,Unnamed: 306_level_1,Unnamed: 307_level_1,Unnamed: 308_level_1,Unnamed: 309_level_1,Unnamed: 310_level_1,Unnamed: 311_level_1,Unnamed: 312_level_1,Unnamed: 313_level_1,Unnamed: 314_level_1,Unnamed: 315_level_1,Unnamed: 316_level_1,Unnamed: 317_level_1,Unnamed: 318_level_1,Unnamed: 319_level_1,Unnamed: 320_level_1,Unnamed: 321_level_1,Unnamed: 322_level_1,Unnamed: 323_level_1,Unnamed: 324_level_1,Unnamed: 325_level_1,Unnamed: 326_level_1,Unnamed: 327_level_1,Unnamed: 328_level_1,Unnamed: 329_level_1,Unnamed: 330_level_1,Unnamed: 331_level_1,Unnamed: 332_level_1,Unnamed: 333_level_1,Unnamed: 334_level_1,Unnamed: 335_level_1,Unnamed: 336_level_1,Unnamed: 337_level_1,Unnamed: 338_level_1,Unnamed: 339_level_1,Unnamed: 340_level_1,Unnamed: 341_level_1,Unnamed: 342_level_1,Unnamed: 343_level_1,Unnamed: 344_level_1,Unnamed: 345_level_1,Unnamed: 346_level_1,Unnamed: 347_level_1,Unnamed: 348_level_1,Unnamed: 349_level_1,Unnamed: 350_level_1,Unnamed: 351_level_1,Unnamed: 352_level_1,Unnamed: 353_level_1,Unnamed: 354_level_1,Unnamed: 355_level_1,Unnamed: 356_level_1,Unnamed: 357_level_1,Unnamed: 358_level_1,Unnamed: 359_level_1,Unnamed: 360_level_1,Unnamed: 361_level_1,Unnamed: 362_level_1,Unnamed: 363_level_1,Unnamed: 364_level_1,Unnamed: 365_level_1,Unnamed: 366_level_1,Unnamed: 367_level_1,Unnamed: 368_level_1,Unnamed: 369_level_1,Unnamed: 370_level_1,Unnamed: 371_level_1,Unnamed: 372_level_1,Unnamed: 373_level_1,Unnamed: 374_level_1,Unnamed: 375_level_1,Unnamed: 376_level_1,Unnamed: 377_level_1,Unnamed: 378_level_1,Unnamed: 379_level_1,Unnamed: 380_level_1,Unnamed: 381_level_1,Unnamed: 382_level_1,Unnamed: 383_level_1,Unnamed: 384_level_1,Unnamed: 385_level_1,Unnamed: 386_level_1,Unnamed: 387_level_1,Unnamed: 388_level_1,Unnamed: 389_level_1,Unnamed: 390_level_1,Unnamed: 391_level_1,Unnamed: 392_level_1,Unnamed: 393_level_1,Unnamed: 394_level_1,Unnamed: 395_level_1,Unnamed: 396_level_1,Unnamed: 397_level_1,Unnamed: 398_level_1,Unnamed: 399_level_1,Unnamed: 400_level_1,Unnamed: 401_level_1,Unnamed: 402_level_1,Unnamed: 403_level_1,Unnamed: 404_level_1,Unnamed: 405_level_1,Unnamed: 406_level_1,Unnamed: 407_level_1,Unnamed: 408_level_1,Unnamed: 409_level_1,Unnamed: 410_level_1,Unnamed: 411_level_1,Unnamed: 412_level_1,Unnamed: 413_level_1,Unnamed: 414_level_1,Unnamed: 415_level_1,Unnamed: 416_level_1,Unnamed: 417_level_1,Unnamed: 418_level_1,Unnamed: 419_level_1,Unnamed: 420_level_1,Unnamed: 421_level_1,Unnamed: 422_level_1,Unnamed: 423_level_1,Unnamed: 424_level_1,Unnamed: 425_level_1,Unnamed: 426_level_1,Unnamed: 427_level_1,Unnamed: 428_level_1,Unnamed: 429_level_1,Unnamed: 430_level_1,Unnamed: 431_level_1,Unnamed: 432_level_1,Unnamed: 433_level_1
1,False,1,0,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True
2,True,0,0,True,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False
3,False,0,0,True,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False
4,False,0,0,True,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True
5,False,0,0,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True


### 3.3 Predictive Modeling

Proceed with training a predictive model. Referring to the scikit-learn model selection flowchart, it's recommended to utilize an ensemble model for classification problems with fewer than 100k samples. Therefore, I'll opt for Random Forest as my ensemble classifier.

To begin, I'll split the data into training and test sets:

In [21]:
# Select features (X) and target variable (y)
X = df_ml.iloc[:, 1:]
y = df_ml.loc[:, 'adopted_user']

# Split the data into training and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=6, stratify=y)

### 3.4 Random Forest Classifier

In [22]:
from sklearn.ensemble import RandomForestClassifier

# Initialize the RandomForestClassifier with balanced subsample weights and a random state for reproducibility
rfc = RandomForestClassifier(class_weight='balanced_subsample', random_state=6)

# Fit the model to the training data
rfc.fit(X_train, y_train)

# Make predictions on the test set
y_pred_rfc = rfc.predict(X_test)

# Print the accuracy on the training set
print('Accuracy on training set = {}'.format(rfc.score(X_train, y_train)))

# Print the accuracy on the test set
print('Accuracy on test set = {}'.format(rfc.score(X_test, y_test)))

Accuracy on training set = 0.9426666666666667
Accuracy on test set = 0.836


In [23]:
from sklearn.metrics import classification_report

# Generate a classification report
print(classification_report(y_test, y_pred_rfc))

              precision    recall  f1-score   support

       False       0.88      0.94      0.91      2639
        True       0.17      0.09      0.12       361

    accuracy                           0.84      3000
   macro avg       0.53      0.52      0.52      3000
weighted avg       0.80      0.84      0.81      3000



In [24]:
from sklearn.metrics import confusion_matrix

# Compute the confusion matrix
print(confusion_matrix(y_test, y_pred_rfc))

[[2474  165]
 [ 327   34]]


In [25]:
# Create a DataFrame to store feature importances
df_features = pd.DataFrame({'rfc': rfc.feature_importances_}, index=df_ml.columns[1:])

# Sort the DataFrame by feature importances in descending order and display the top 15 features
df_features.sort_values('rfc', ascending=False)[:15]

Unnamed: 0,rfc
mailing_list,0.035676
marketing_drip,0.024367
email_gmail.com,0.01687
email_jourrapide.com,0.013444
email_yahoo.com,0.013412
email_other,0.013091
email_gustr.com,0.012633
creation_source_PERSONAL_PROJECTS,0.011543
email_cuvox.de,0.009997
email_hotmail.com,0.009989


While the initial assessment may suggest that the "out-of-the-box" Random Forest Classifier performed well by correctly classifying 84% of the test set samples, it's crucial to consider the null accuracy rate of 88% (if the model simply predicted that every user did not adopt the product). In cases of imbalanced data like ours, it becomes essential to analyze the model's ability to identify the minority class (in this scenario, the adopted users). Models often tend to favor majority classes when data is imbalanced. Given that the primary objective of this project is to determine which features are most predictive of future user product adoption, it's particularly critical to focus on identifying the minority class of adopted users.

Here, the classification report and confusion matrix reveal that the "out-of-the-box" model identified adopted users at a notably low rate (only 29 out of 361 in the test set). Consequently, the feature importance values of the model may not offer much insight into which features are most predictive of future user product adoption.

However, all hope is not lost. The Random Forest Classifier offers several hyperparameters that can be fine-tuned to enhance performance. Let's proceed to tune some of these hyperparameters to assess whether it improves the model's accuracy rate on predicting adopted users:

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

# Initialize RandomForestClassifier with a random state
rfc = RandomForestClassifier(random_state=6)

# Define the grid of hyperparameters to search
param_grid = {
    'n_estimators': [100, 300, 500],
    'max_depth': [5, 15, 30],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 3, 5]
}

# Perform GridSearchCV with cross-validation to find the best hyperparameters
grid = GridSearchCV(rfc, param_grid, cv=3, scoring='accuracy')
grid.fit(X_train, y_train)

# Make predictions on the test set using the best model from grid search
y_pred_rfc = grid.predict(X_test)

# Print the accuracy score of the best model on the test set
print(grid.score(X_test, y_test))

In [None]:
grid.best_score_

In [None]:
grid.best_estimator_

It appears that tuning certain hyperparameters has enhanced the overall accuracy of the model. Let's utilize the classifier with the adjusted hyperparameter values on both the training and test sets to determine if this enables the classifier to more accurately identify adopted users:

### 3.5 Tuned Model Performance

In [None]:
from sklearn.ensemble import RandomForestClassifier

# Initialize RandomForestClassifier with tuned hyperparameters
rfc_tuned = RandomForestClassifier(n_estimators=100, max_depth=5, min_samples_split=2, min_samples_leaf=1, criterion='gini', class_weight='balanced_subsample', random_state=6)

# Fit the model to the training data
rfc_tuned.fit(X_train, y_train)

# Make predictions on the test set
y_pred_rfc_tuned = rfc_tuned.predict(X_test)

# Print the accuracy on the training set
print('Accuracy on training set = {}'.format(rfc_tuned.score(X_train, y_train)))

# Print the accuracy on the test set
print('Accuracy on test set = {}'.format(rfc_tuned.score(X_test, y_test)))

In [None]:
from sklearn.metrics import classification_report

# Generate a classification report
print(classification_report(y_test, y_pred_rfc_tuned))

In [None]:
from sklearn.metrics import confusion_matrix

# Compute the confusion matrix
print(confusion_matrix(y_test, y_pred_rfc_tuned))

Despite the notable decrease in the overall accuracy of the tuned classifier, the confusion matrix reveals a significant improvement in accurately identifying over half of the adopted users (183 out of 361 in the test set). Consequently, we should be able to gain insights into which specific features are most predictive of future user adoption by examining the features that received the greatest increase in feature importance according to the tuned model. Let's delve into this further:

In [None]:
# Add the feature importances from the tuned RandomForestClassifier to df_features
df_features['rfc_tuned'] = rfc_tuned.feature_importances_

# Calculate the difference in feature importances between the tuned and original model
df_features['difference'] = df_features['rfc_tuned'] - df_features['rfc']

# Display the top 15 features with the highest increase in importance
df_features.sort_values('difference', ascending=False)[:15]

### 4.0 Conclusion 

The table above highlights the factors most likely to influence a user's adoption of the product. It suggests that users are more inclined to adopt the product when they receive invitations to join another user's workspace or when they are invited as guests to an organization. Additionally, membership in specific organizations, particularly 'org_id_0,' increases the likelihood of user adoption. The tuned classifier also emphasizes the importance of certain email domains, such as Google authentication, and receiving invitations to become full members of organizations.

Based on these findings, I propose the following recommendations for Relax to enhance future user adoption efforts:

1. Launch promotional campaigns to encourage users to invite potential users to join their personal workspaces on the product.
2. Initiate promotions aimed at encouraging organizations to expand by sending guest and/or full invitations to potential users to join the product.
3. Highlight specific organizations, particularly org_id_0, to all users for increased adoption potential.
4. Tailor email marketing campaigns to target existing and potential users with yahoo.com, hotmail.com, and gmail.com email addresses, as identified by the model.