# Problem Statement

## Marketplace Feature Table 

Your **Client ComZ** is an ecommerce company. The company wants to focus on targeting the right **customers**  with the right products to increase overall revenue and conversion rate.

To target the right customers with the right products, they need to build an ML model for marketing based on user interaction with products in the past like number of views,  most viewed product, number of activities of user, vintage of user and others. 

ComZ has contacted the Data Science and Engineering team to use this information to fuel the personalized advertisements, email marketing campaigns, or special offers on the landing and category pages of the company's website.

You, being a part of the data engineering team, are expected to **“Develop input features”**  for the efficient marketing model given the **Visitor log data** and **User Data**.

# Importing Libraries

In [1]:
import pandas as pd
import numpy as np
import datetime
from datetime import timedelta

# Data Overview

In [2]:
#Load and read the visitor log dataset
visitor_log_df = pd.read_csv('VisitorLogsData.csv')
visitor_log_df

Unnamed: 0,webClientID,VisitDateTime,ProductID,UserID,Activity,Browser,OS,City,Country
0,WI10000050298,2018-05-07 04:28:45.970,pr100631,,,Chrome Mobile,Android,Chennai,India
1,WI10000025922,2018-05-13 07:26:04.964,pr100707,,,Chrome,Windows,,Taiwan
2,WI100000204522,2018-05-11 11:43:42.832,pr100030,,click,Chrome,windows,Gurgaon,India
3,WI10000011974,2018-05-13 15:20:23.436,Pr100192,,CLICK,Chrome,Windows,,
4,WI100000441953,2018-05-08 20:44:25.238,Pr100762,,click,Chrome,mac os x,Iselin,United States
...,...,...,...,...,...,...,...,...,...
6587995,WI100000406653,2018-05-21 07:14:03.231,pr100008,,pageload,Chrome,Windows,,
6587996,WI100000159562,2018-05-25 09:13:04.011,Pr100307,,click,Chrome,Windows,,France
6587997,WI100000215596,,Pr100147,,,Chrome,Windows,Durgapur,India
6587998,WI100000174318,2018-05-20 12:09:35.347,pr100728,,pageload,Chrome Mobile,Android,Coimbatore,India


In [3]:
#Load and read the user dataset
user_df = pd.read_csv('userTable.csv')
user_df

Unnamed: 0,UserID,Signup Date,User Segment
0,U133159,2018-04-14 07:01:16.202607+00:00,C
1,U129368,2017-12-02 09:38:41.584270+00:00,B
2,U109654,2013-03-19 11:38:55+00:00,B
3,U108998,2018-01-18 08:29:51.627954+00:00,C
4,U131393,2018-03-27 08:05:28.806800+00:00,B
...,...,...,...
34045,U134073,2018-03-19 12:35:10.857456+00:00,B
34046,U113667,2018-02-28 08:29:22.966713+00:00,B
34047,U128470,2018-03-04 13:03:12.828673+00:00,B
34048,U104005,2018-01-22 16:16:35.289000+00:00,B


In [4]:
#Info of the two datasets
visitor_log_df.info()
print('\n------------------------------------------\n')
user_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6588000 entries, 0 to 6587999
Data columns (total 9 columns):
 #   Column         Dtype 
---  ------         ----- 
 0   webClientID    object
 1   VisitDateTime  object
 2   ProductID      object
 3   UserID         object
 4   Activity       object
 5   Browser        object
 6   OS             object
 7   City           object
 8   Country        object
dtypes: object(9)
memory usage: 452.4+ MB

------------------------------------------

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 34050 entries, 0 to 34049
Data columns (total 3 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   UserID        34050 non-null  object
 1   Signup Date   34050 non-null  object
 2   User Segment  34050 non-null  object
dtypes: object(3)
memory usage: 798.2+ KB


In [5]:
#Null values in the visitor log dataset
visitor_log_df.isnull().sum()

webClientID            0
VisitDateTime     658915
ProductID         527137
UserID           5937305
Activity          889446
Browser                0
OS                     0
City             2165831
Country           397693
dtype: int64

In [6]:
#Null values in the user dataframe
user_df.isnull().sum()

UserID          0
Signup Date     0
User Segment    0
dtype: int64

# Data Cleaning

### We will keep only the rows with the registered users i.e where the UserID column is not null.

In [7]:
visitor_log_df = visitor_log_df[visitor_log_df.UserID.notna()]

In [8]:
#Drop useless columns
visitor_log_df.drop(['webClientID','City','Country'], axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


In [9]:
#Some strings appear more than once with lower or upper.
#So we replace those strings.
visitor_log_df.replace(
    ['click','pageload','windows','android','mac os x','linux','ios','ubuntu','chrome os','fedora','other','tizen','freebsd',
     'solaris'],
    ['CLICK','PAGELOAD','Windows','Android','Mac OS X','Linux','iOS','Ubuntu','Chrome OS','Fedora','Other','Tizen','FreeBSD',
     'Solaris'],
    inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  method=method,


In [10]:
#First we convert every date to unix datetime format
#Also impute the missing dates with random dates within start and end date
#And put all those into a list

a = []

start = int(str(pd.to_datetime(['2018-05-07 00:00:00.000']).astype(int))[12:31])
end = int(str(pd.to_datetime(['2018-05-28 00:00:00.000']).astype(int))[12:31])

for date in visitor_log_df['VisitDateTime']:
    if type(date) == str:
        if len(date) == 23:
            a.append(int(str(pd.to_datetime([date]).astype(int))[12:31]))
        else:
            a.append(int(date))
    else:
        a.append(int(np.random.uniform(low=start, high=end, size=None)))

In [11]:
#Covert the unix datetimes to datetime format
visitor_log_df.VisitDateTime = pd.to_datetime(a)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value


In [12]:
visitor_log_df.head(20)

Unnamed: 0,VisitDateTime,ProductID,UserID,Activity,Browser,OS
14,2018-05-15 06:48:15.249,Pr100017,U106593,CLICK,Chrome Mobile,Android
21,2018-05-23 07:02:01.790,Pr101008,U108297,,Chrome Mobile,Android
23,2018-05-10 06:28:53.391,Pr100241,U132443,,Firefox,Windows
24,2018-05-08 12:40:02.153,pr100495,U134616,CLICK,Chrome,Windows
33,2018-05-11 15:35:43.689,Pr100363,U130784,CLICK,Chrome,Chrome OS
50,2018-05-19 00:02:31.347,pr100340,U120983,CLICK,Chrome,Windows
54,2018-05-19 04:51:45.337,Pr100166,U120287,CLICK,Chrome,Windows
61,2018-05-07 05:54:39.408,pr101042,U124307,CLICK,Chrome,Mac OS X
68,2018-05-23 09:44:44.023,Pr101042,U113937,CLICK,Safari,Mac OS X
69,2018-05-13 13:17:03.751,Pr101042,U115735,CLICK,Chrome,Windows


In [13]:
#Merge visitor log data table and user data table
df = pd.merge(visitor_log_df, user_df, on='UserID')
df.head()

Unnamed: 0,VisitDateTime,ProductID,UserID,Activity,Browser,OS,Signup Date,User Segment
0,2018-05-15 06:48:15.249,Pr100017,U106593,CLICK,Chrome Mobile,Android,2017-09-10 15:48:09.451327+00:00,B
1,2018-05-16 06:53:23.737,Pr100241,U106593,PAGELOAD,Chrome Mobile,Android,2017-09-10 15:48:09.451327+00:00,B
2,2018-05-15 06:47:49.239,Pr100017,U106593,CLICK,Chrome Mobile,Android,2017-09-10 15:48:09.451327+00:00,B
3,2018-05-16 06:54:26.424,pr100901,U106593,PAGELOAD,Chrome Mobile,Android,2017-09-10 15:48:09.451327+00:00,B
4,2018-05-15 06:47:36.691,Pr100017,U106593,CLICK,Chrome Mobile,Android,2017-09-10 15:48:09.451327+00:00,B


In [14]:
#Convert to datetime format
df['Signup Date'] = pd.to_datetime(df['Signup Date']).dt.tz_localize(None)

In [15]:
#Drop useless columns
df.drop('User Segment', axis=1, inplace=True)

In [16]:
#Set VisitDateTime column as index and select the dates we want and sort them
#Then reset the index
df = df.set_index('VisitDateTime').sort_index().loc['2018-05-07':'2018-05-27'].reset_index()
df

Unnamed: 0,VisitDateTime,ProductID,UserID,Activity,Browser,OS,Signup Date
0,2018-05-07 00:00:01.419000000,Pr100083,U107518,CLICK,Chrome,Windows,2018-04-13 21:41:58.458685
1,2018-05-07 00:00:07.040000000,Pr100709,U136965,PAGELOAD,Chrome Mobile,Android,2017-09-27 01:52:21.434546
2,2018-05-07 00:00:10.434000000,,U136963,,Chrome Mobile,Android,2014-07-08 10:13:47.000000
3,2018-05-07 00:00:11.164000000,Pr100102,U136963,CLICK,Chrome Mobile,Android,2014-07-08 10:13:47.000000
4,2018-05-07 00:00:12.156000000,pr100102,U136963,CLICK,Chrome Mobile,Android,2014-07-08 10:13:47.000000
...,...,...,...,...,...,...,...
650690,2018-05-27 23:58:47.845932544,pr100051,U104056,,Chrome,Windows,2017-08-02 13:30:02.262606
650691,2018-05-27 23:59:10.551000000,pr100275,U104304,PAGELOAD,Chrome,Windows,2018-04-07 22:00:09.522100
650692,2018-05-27 23:59:23.729000000,Pr100254,U104303,,Chrome,Windows,2017-10-23 15:05:13.086042
650693,2018-05-27 23:59:48.631126272,,U104736,CLICK,Chrome,Windows,2017-04-17 05:20:09.564063


In [17]:
#Load the the sample file
sample_df = pd.read_csv("sample_submission_M7Vpb9f.csv")
sample_df.shape

(34050, 9)

In [18]:
#Make a new dataframe and setting the UserID column from the sample file
my_submission = pd.DataFrame()
my_submission['UserID'] = sample_df.UserID

In [19]:
#filling the nulls with the mode of the column
df.Activity.fillna(df.Activity.mode()[0], inplace=True)

# Feature Engineering

## 1. No_of_days_Visited_7_Days:

### How many days a user was active on platform in the last 7 days.

In [20]:
#Select data only from the last 7 days
last_7_days = df[df.VisitDateTime >= pd.to_datetime('2018-05-28 00:00:00.000') - timedelta(days=7)]
last_7_days

Unnamed: 0,VisitDateTime,ProductID,UserID,Activity,Browser,OS,Signup Date
416384,2018-05-21 00:00:03.968000000,,U119712,CLICK,Chrome,Windows,2017-02-14 13:11:02.081089
416385,2018-05-21 00:00:07.237000000,Pr100390,U119712,PAGELOAD,Chrome,Windows,2017-02-14 13:11:02.081089
416386,2018-05-21 00:00:16.177000000,Pr101290,U102541,PAGELOAD,Opera,Windows,2016-03-29 00:39:59.354969
416387,2018-05-21 00:00:16.177000000,Pr101290,U102541,PAGELOAD,Opera,Windows,2016-03-29 00:39:59.354969
416388,2018-05-21 00:00:25.082000000,pr102030,U119711,PAGELOAD,Chrome,Windows,2016-10-14 11:35:10.366146
...,...,...,...,...,...,...,...
650690,2018-05-27 23:58:47.845932544,pr100051,U104056,CLICK,Chrome,Windows,2017-08-02 13:30:02.262606
650691,2018-05-27 23:59:10.551000000,pr100275,U104304,PAGELOAD,Chrome,Windows,2018-04-07 22:00:09.522100
650692,2018-05-27 23:59:23.729000000,Pr100254,U104303,CLICK,Chrome,Windows,2017-10-23 15:05:13.086042
650693,2018-05-27 23:59:48.631126272,,U104736,CLICK,Chrome,Windows,2017-04-17 05:20:09.564063


In [21]:
no_of_days_visited_7_days = last_7_days.groupby(['UserID',last_7_days['VisitDateTime'].dt.date]).count()\
                                        .drop(columns=['VisitDateTime']).reset_index()

no_of_days_visited_7_days = no_of_days_visited_7_days.groupby('UserID')['VisitDateTime'].count()\
                                                        .reset_index(name='No_of_days_Visited_7_Days')

In [22]:
no_of_days_visited_7_days

Unnamed: 0,UserID,No_of_days_Visited_7_Days
0,U100003,1
1,U100004,2
2,U100005,1
3,U100006,1
4,U100008,6
...,...,...
19831,U136936,1
19832,U136949,1
19833,U136956,1
19834,U136959,1


In [23]:
#Merge the above dataframe with the final submission dataframe
my_submission = my_submission.merge(no_of_days_visited_7_days, on='UserID', how='left').fillna(0)

In [24]:
my_submission

Unnamed: 0,UserID,No_of_days_Visited_7_Days
0,U100002,0.0
1,U100003,1.0
2,U100004,2.0
3,U100005,1.0
4,U100006,1.0
...,...,...
34045,U136960,0.0
34046,U136961,0.0
34047,U136963,1.0
34048,U136964,0.0


## 2. No_Of_Products_Viewed_15_Days:

### Number of Products viewed by the user in the last 15 days.

In [25]:
#Select data only from the last 15 days
last_15_days = df[df.VisitDateTime >= pd.to_datetime('2018-05-28 00:00:00.000') - timedelta(days=15)]
last_15_days

Unnamed: 0,VisitDateTime,ProductID,UserID,Activity,Browser,OS,Signup Date
195367,2018-05-13 00:00:04.754717696,Pr100035,U100069,CLICK,Chrome,Windows,2013-07-15 21:01:26.000000
195368,2018-05-13 00:00:07.172000000,Pr100070,U122747,CLICK,Chrome,Windows,2017-12-04 04:53:42.975558
195369,2018-05-13 00:00:07.495000000,pr100070,U122747,CLICK,Chrome,Windows,2017-12-04 04:53:42.975558
195370,2018-05-13 00:00:07.800000000,pr100070,U122747,CLICK,Chrome,Windows,2017-12-04 04:53:42.975558
195371,2018-05-13 00:00:12.499000000,pr100070,U122747,CLICK,Chrome,Windows,2017-12-04 04:53:42.975558
...,...,...,...,...,...,...,...
650690,2018-05-27 23:58:47.845932544,pr100051,U104056,CLICK,Chrome,Windows,2017-08-02 13:30:02.262606
650691,2018-05-27 23:59:10.551000000,pr100275,U104304,PAGELOAD,Chrome,Windows,2018-04-07 22:00:09.522100
650692,2018-05-27 23:59:23.729000000,Pr100254,U104303,CLICK,Chrome,Windows,2017-10-23 15:05:13.086042
650693,2018-05-27 23:59:48.631126272,,U104736,CLICK,Chrome,Windows,2017-04-17 05:20:09.564063


In [26]:
no_of_products_viewed_15_days = last_15_days.groupby('UserID')['ProductID'].nunique()\
                                             .reset_index(name='No_Of_Products_Viewed_15_Days')
no_of_products_viewed_15_days

Unnamed: 0,UserID,No_Of_Products_Viewed_15_Days
0,U100002,3
1,U100003,3
2,U100004,16
3,U100005,3
4,U100006,1
...,...,...
28478,U136936,1
28479,U136949,1
28480,U136956,1
28481,U136959,1


In [27]:
#Merge the above dataframe with the final submission dataframe
my_submission = my_submission.merge(no_of_products_viewed_15_days, on='UserID', how='left').fillna(0)

In [28]:
my_submission

Unnamed: 0,UserID,No_of_days_Visited_7_Days,No_Of_Products_Viewed_15_Days
0,U100002,0.0,3.0
1,U100003,1.0,3.0
2,U100004,2.0,16.0
3,U100005,1.0,3.0
4,U100006,1.0,1.0
...,...,...,...
34045,U136960,0.0,0.0
34046,U136961,0.0,0.0
34047,U136963,1.0,1.0
34048,U136964,0.0,0.0


## 3. User_Vintage:

### Vintage (In Days) of the user as of today (28-May-2018).

In [29]:
df['User_Vintage'] = (pd.to_datetime('2018-05-28 00:00:00.000') - df['Signup Date']).dt.days

In [30]:
#Merge it with the final submission dataframe
my_submission = my_submission.merge(df.groupby('UserID')['User_Vintage'].first().reset_index(), on='UserID', how='left')

In [31]:
my_submission

Unnamed: 0,UserID,No_of_days_Visited_7_Days,No_Of_Products_Viewed_15_Days,User_Vintage
0,U100002,0.0,3.0,52
1,U100003,1.0,3.0,1020
2,U100004,2.0,16.0,340
3,U100005,1.0,3.0,680
4,U100006,1.0,1.0,54
...,...,...,...,...
34045,U136960,0.0,0.0,754
34046,U136961,0.0,0.0,59
34047,U136963,1.0,1.0,1419
34048,U136964,0.0,0.0,494


## 4. Most_Viewed_product_15_Days:

### Most frequently viewed (page loads) product by the user in the last 15 days. If there are multiple products that have a similar number of page loads then , consider the recent one. If a user has not viewed any product in the last 15 days then put it as Product101. 

In [32]:
most_viewed_product_15_days = last_15_days[last_15_days['Activity']=='PAGELOAD'].groupby(['UserID','ProductID'])\
                                    .agg({'Activity':'count','VisitDateTime':'max'}).reset_index()

most_viewed_product_15_days = most_viewed_product_15_days.sort_values(['Activity','VisitDateTime'], ascending=False)\
                                                            .drop_duplicates(['UserID']).sort_values('UserID')

most_viewed_product_15_days = most_viewed_product_15_days.rename(columns={'ProductID':'Most_Viewed_product_15_Days'})

most_viewed_product_15_days = most_viewed_product_15_days[['UserID','Most_Viewed_product_15_Days']]

In [33]:
most_viewed_product_15_days

Unnamed: 0,UserID,Most_Viewed_product_15_Days
0,U100002,Pr100258
1,U100003,Pr100079
7,U100004,Pr100753
12,U100005,Pr100234
15,U100006,Pr101111
...,...,...
69380,U136885,Pr101677
69381,U136934,Pr100130
69382,U136936,Pr100003
69383,U136949,Pr100083


In [34]:
#Merge the above dataframe with the final submission dataframe
my_submission = my_submission.merge(most_viewed_product_15_days, on='UserID', how='left').fillna('Product101')

In [35]:
my_submission

Unnamed: 0,UserID,No_of_days_Visited_7_Days,No_Of_Products_Viewed_15_Days,User_Vintage,Most_Viewed_product_15_Days
0,U100002,0.0,3.0,52,Pr100258
1,U100003,1.0,3.0,1020,Pr100079
2,U100004,2.0,16.0,340,Pr100753
3,U100005,1.0,3.0,680,Pr100234
4,U100006,1.0,1.0,54,Pr101111
...,...,...,...,...,...
34045,U136960,0.0,0.0,754,Product101
34046,U136961,0.0,0.0,59,Product101
34047,U136963,1.0,1.0,1419,Product101
34048,U136964,0.0,0.0,494,Product101


## 5. Most_Active_OS:

### Most Frequently used OS by user. 

In [36]:
most_active_OS = df.groupby(['UserID','OS']).agg({'OS':'count','VisitDateTime':'max'}).rename(columns={'OS':'OS_count'})\
                                            .reset_index()

most_active_OS = most_active_OS.sort_values(['OS_count','VisitDateTime'], ascending=False)

most_active_OS = most_active_OS.drop_duplicates(['UserID']).sort_values('UserID')

most_active_OS = most_active_OS.rename(columns={'OS':'Most_Active_OS'})

most_active_OS = most_active_OS[['UserID', 'Most_Active_OS']]

In [37]:
most_active_OS

Unnamed: 0,UserID,Most_Active_OS
0,U100002,Android
1,U100003,Windows
2,U100004,Windows
3,U100005,Android
4,U100006,Android
...,...,...
36265,U136960,Windows
36266,U136961,Android
36267,U136963,Android
36268,U136964,Windows


In [38]:
#Merge the above dataframe with the final submission dataframe
my_submission = my_submission.merge(most_active_OS, on='UserID', how='left')

In [39]:
my_submission

Unnamed: 0,UserID,No_of_days_Visited_7_Days,No_Of_Products_Viewed_15_Days,User_Vintage,Most_Viewed_product_15_Days,Most_Active_OS
0,U100002,0.0,3.0,52,Pr100258,Android
1,U100003,1.0,3.0,1020,Pr100079,Windows
2,U100004,2.0,16.0,340,Pr100753,Windows
3,U100005,1.0,3.0,680,Pr100234,Android
4,U100006,1.0,1.0,54,Pr101111,Android
...,...,...,...,...,...,...
34045,U136960,0.0,0.0,754,Product101,Windows
34046,U136961,0.0,0.0,59,Product101,Android
34047,U136963,1.0,1.0,1419,Product101,Android
34048,U136964,0.0,0.0,494,Product101,Windows


## 6. Recently_Viewed_Product:

### Most recently viewed (page loads) product by the user. If a user has not viewed any product then put it as Product101.

In [40]:
recently_viewed_product = df[df.Activity == 'PAGELOAD'].groupby(['UserID','ProductID'])['VisitDateTime'].max().reset_index()

recently_viewed_product = recently_viewed_product.sort_values('VisitDateTime', ascending=False).drop_duplicates(['UserID'])\
                            .sort_values('UserID')

recently_viewed_product = recently_viewed_product.rename(columns={'ProductID':'Recently_Viewed_Product'})

recently_viewed_product = recently_viewed_product[['UserID', 'Recently_Viewed_Product']]

In [41]:
recently_viewed_product

Unnamed: 0,UserID,Recently_Viewed_Product
0,U100002,Pr100258
1,U100003,Pr100079
7,U100004,Pr100753
12,U100005,Pr100234
17,U100006,Pr101111
...,...,...
97973,U136956,Pr100312
97974,U136959,Pr100102
97977,U136961,Pr101381
97978,U136963,Pr100166


In [42]:
#Merge the above dataframe with the final submission dataframe
my_submission = my_submission.merge(recently_viewed_product, on='UserID', how='left').fillna('Product101')

In [43]:
my_submission

Unnamed: 0,UserID,No_of_days_Visited_7_Days,No_Of_Products_Viewed_15_Days,User_Vintage,Most_Viewed_product_15_Days,Most_Active_OS,Recently_Viewed_Product
0,U100002,0.0,3.0,52,Pr100258,Android,Pr100258
1,U100003,1.0,3.0,1020,Pr100079,Windows,Pr100079
2,U100004,2.0,16.0,340,Pr100753,Windows,Pr100753
3,U100005,1.0,3.0,680,Pr100234,Android,Pr100234
4,U100006,1.0,1.0,54,Pr101111,Android,Pr101111
...,...,...,...,...,...,...,...
34045,U136960,0.0,0.0,754,Product101,Windows,Product101
34046,U136961,0.0,0.0,59,Product101,Android,Pr101381
34047,U136963,1.0,1.0,1419,Product101,Android,Pr100166
34048,U136964,0.0,0.0,494,Product101,Windows,Product101


## 7. Pageloads_last_7_days:

### Count of Page loads in the last 7 days by the user.

In [44]:
pageloads_last_7_days = last_7_days[last_7_days.Activity == 'PAGELOAD'].groupby('UserID')['Activity'].count()\
                                                                        .reset_index(name='Pageloads_last_7_days')

In [45]:
pageloads_last_7_days

Unnamed: 0,UserID,Pageloads_last_7_days
0,U100003,1
1,U100004,2
2,U100005,1
3,U100006,1
4,U100008,23
...,...,...
15515,U136853,1
15516,U136934,1
15517,U136936,1
15518,U136949,1


In [46]:
#Merge the above dataframe with the final submission dataframe
my_submission = my_submission.merge(pageloads_last_7_days, on='UserID', how='left').fillna(0)

In [47]:
my_submission

Unnamed: 0,UserID,No_of_days_Visited_7_Days,No_Of_Products_Viewed_15_Days,User_Vintage,Most_Viewed_product_15_Days,Most_Active_OS,Recently_Viewed_Product,Pageloads_last_7_days
0,U100002,0.0,3.0,52,Pr100258,Android,Pr100258,0.0
1,U100003,1.0,3.0,1020,Pr100079,Windows,Pr100079,1.0
2,U100004,2.0,16.0,340,Pr100753,Windows,Pr100753,2.0
3,U100005,1.0,3.0,680,Pr100234,Android,Pr100234,1.0
4,U100006,1.0,1.0,54,Pr101111,Android,Pr101111,1.0
...,...,...,...,...,...,...,...,...
34045,U136960,0.0,0.0,754,Product101,Windows,Product101,0.0
34046,U136961,0.0,0.0,59,Product101,Android,Pr101381,0.0
34047,U136963,1.0,1.0,1419,Product101,Android,Pr100166,0.0
34048,U136964,0.0,0.0,494,Product101,Windows,Product101,0.0


## 8. Clicks_last_7_days:

### Count of Clicks in the last 7 days  by the user.

In [48]:
clicks_last_7_days = last_7_days[last_7_days.Activity == 'CLICK'].groupby('UserID')['Activity'].count()\
                                                                        .reset_index(name='Clicks_last_7_days')

In [49]:
clicks_last_7_days

Unnamed: 0,UserID,Clicks_last_7_days
0,U100003,2
1,U100005,1
2,U100008,31
3,U100009,6
4,U100012,20
...,...,...
15021,U136888,1
15022,U136921,1
15023,U136924,1
15024,U136959,1


In [50]:
#Merge the above dataframe with the final submission dataframe
my_submission = my_submission.merge(clicks_last_7_days, on='UserID', how='left').fillna(0)

In [51]:
my_submission

Unnamed: 0,UserID,No_of_days_Visited_7_Days,No_Of_Products_Viewed_15_Days,User_Vintage,Most_Viewed_product_15_Days,Most_Active_OS,Recently_Viewed_Product,Pageloads_last_7_days,Clicks_last_7_days
0,U100002,0.0,3.0,52,Pr100258,Android,Pr100258,0.0,0.0
1,U100003,1.0,3.0,1020,Pr100079,Windows,Pr100079,1.0,2.0
2,U100004,2.0,16.0,340,Pr100753,Windows,Pr100753,2.0,0.0
3,U100005,1.0,3.0,680,Pr100234,Android,Pr100234,1.0,1.0
4,U100006,1.0,1.0,54,Pr101111,Android,Pr101111,1.0,0.0
...,...,...,...,...,...,...,...,...,...
34045,U136960,0.0,0.0,754,Product101,Windows,Product101,0.0,0.0
34046,U136961,0.0,0.0,59,Product101,Android,Pr101381,0.0,0.0
34047,U136963,1.0,1.0,1419,Product101,Android,Pr100166,0.0,1.0
34048,U136964,0.0,0.0,494,Product101,Windows,Product101,0.0,0.0


In [52]:
#Converting final dataframe to csv file
#my_submission.to_csv('my_submission.csv', index=False)