__Name:__ Amrita Veshin <br>
__Email:__ amritav99@gmail.com

-------------------------------------------------------------------------------------------------------
# <center> Generating User Behaviour Analysis Data Using Pandas and Faker Library
-------------------------------------------------------------------------------------------------------
## Introduction

Welcome to this Google Colab notebook on "Generating User Behavior Analysis Data Using Pandas." In the field of data analytics, understanding user behavior is of paramount importance for optimizing products and services, enhancing user experiences, and making informed business decisions. However, obtaining real user data for analysis can be challenging due to privacy concerns or data availability limitations.

In this notebook, we address this challenge by demonstrating how to generate synthetic user behavior data using Python and the `Faker` library. This approach allows us to create a dataset that closely resembles real-world user interactions, making it an invaluable resource for professionals, students, and organizations seeking to explore and practice user behavior analysis techniques.


## Importing Necessary Libraries

In [3]:
!pip install faker

import pandas as pd
import random
from faker import Faker
import datetime


Collecting faker
  Downloading Faker-19.6.2-py3-none-any.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: faker
Successfully installed faker-19.6.2


## Defining required objects

In [11]:
# Create a Faker object to generate random user data
fake = Faker()

# Define lists of possible actions and page types
actions = ['Viewed', 'Added_to_Cart', 'Initiated', 'Purchase']
page_types = ['Landing_Page', 'Product_Page', 'Cart', 'Checkout', 'Thank_You']

# Initialize an empty list to store data
data = []

# Create a set to track unique user IDs
unique_user_ids = set()


## Generating User Behaviour Data through For() Loop

In [12]:
# Generate 10,000 records
for _ in range(10000):
    # Ensure unique user IDs
    while True:
        user_id = fake.random_int(min=1, max=10000)
        if user_id not in unique_user_ids:
            unique_user_ids.add(user_id)
            break

    session_id = fake.random_int(min=1000, max=9999)
    date = fake.date_between(start_date='-30d', end_date='today')  # Random date within the last 30 days
    page_type = random.choice(page_types)
    action = random.choice(actions)
    product_viewed = fake.random_element(elements=['Product_A', 'Product_B', 'Product_C', 'Product_D'])
    cart_added = random.randint(0, 1)
    purchase_made = 1 if action == 'Purchase' else 0

    data.append([user_id, session_id, date, page_type, action, product_viewed, cart_added, purchase_made])


## Creating a DataFrame to store the generated data

In [13]:
# Create a DataFrame from the generated data
df = pd.DataFrame(data, columns=['User_ID', 'Session_ID', 'Date', 'Page_Type', 'Action', 'Product_Viewed', 'Cart_Added', 'Purchase_Made'])
print(df.head())

   User_ID  Session_ID        Date     Page_Type         Action  \
0     5330        6199  2023-09-16  Product_Page      Initiated   
1     3250        3684  2023-09-14  Landing_Page  Added_to_Cart   
2     5355        3305  2023-09-01  Landing_Page  Added_to_Cart   
3     6141        3173  2023-09-09  Product_Page         Viewed   
4     5958        3650  2023-09-06          Cart         Viewed   

  Product_Viewed  Cart_Added  Purchase_Made  
0      Product_B           1              0  
1      Product_C           0              0  
2      Product_D           1              0  
3      Product_C           0              0  
4      Product_D           0              0  


## Saving the DataFrame to a CSV file for downloading/storing

In [14]:
# Save the DataFrame to a CSV file
df.to_csv('user_behavior_dataset.csv', index=False)