## Generating User Interaction Data

To generate the user interaction data, you can use the following steps:

1. **Generate User Data**: Create a dataset with 3000 users, each having the specified attributes (User ID, First Name, Last Name, Email, Age, Gender, Postcode, Country).

2. **Scrape Items Data**: Scrape at least 300 items from the Sunglass Hut website, ensuring each item has the specified attributes (Item ID, Brand, Style, Model, Color, Material, Shape, Size, Price, Prescription, Polarized, Lens details).

3. **Simulate User Interactions**:
        - For each user, simulate between 0 to 10 interactions with a mean of 2 and a standard deviation of 3.
        - Ensure the interactions follow the specified probabilities for brand, style, size, color, price range, prescription, and polarization.

4. **Store the Data**: Save the generated data in a suitable format (e.g., CSV, JSON) for further analysis and model training.



---

In [3]:
from faker import Faker
from faker.providers import person, address, internet
import pandas as pd
import numpy as np

faker = Faker(['en_AU'])
faker.add_provider(internet)

In [10]:
# Population data for Australian states and territories
population_data = {
    'ACT': 431215,
    'NSW': 8166369,
    'NT': 246500,
    'QLD': 5184847,
    'SA': 1770591,
    'TAS': 541100,
    'VIC': 6680648,
    'WA': 2668981
}

# Total population
total_population = sum(population_data.values())

# Calculate proportions
proportions = {state: population / total_population for state, population in population_data.items()}

# Number of postcodes to generate
num_postcodes = 3000

# Generate postcodes based on proportions
postcodes = []
for state, proportion in proportions.items():
    num_state_postcodes = int(proportion * num_postcodes)
    if state == 'ACT':
        num_state_postcodes //= 2  # Halve the number for each range
        postcodes.extend(np.random.randint(2600, 2619, num_state_postcodes).tolist())
        postcodes.extend(np.random.randint(2900, 3000, num_state_postcodes).tolist())
    elif state == 'NT':
        num_state_postcodes //= 2  # Halve the number for each range
        postcodes.extend(np.random.randint(800, 900, num_state_postcodes).tolist())
        postcodes.extend(np.random.randint(900, 1000, num_state_postcodes).tolist())
    elif state == 'NSW':
        postcodes.extend(np.random.randint(2000, 3000, num_state_postcodes).tolist())
    elif state == 'QLD':
        postcodes.extend(np.random.randint(4000, 5000, num_state_postcodes).tolist())
    elif state == 'SA':
        postcodes.extend(np.random.randint(5000, 6000, num_state_postcodes).tolist())
    elif state == 'TAS':
        postcodes.extend(np.random.randint(7000, 8000, num_state_postcodes).tolist())
    elif state == 'VIC':
        postcodes.extend(np.random.randint(3000, 4000, num_state_postcodes).tolist())
    elif state == 'WA':
        postcodes.extend(np.random.randint(6000, 7000, num_state_postcodes).tolist())

# Adjust the number of postcodes to match the specified total
if len(postcodes) < num_postcodes:
    additional_postcodes = np.random.choice(postcodes, num_postcodes - len(postcodes), replace=True).tolist()
    postcodes.extend(additional_postcodes)
elif len(postcodes) > num_postcodes:
    postcodes = postcodes[:num_postcodes]


post = pd.DataFrame({
    'Postcode': postcodes,
})

post.shape

(3000, 1)

In [15]:
id = [faker.unique.random_int(min=100000, max=9999999) for _ in range(3000)]
name = [faker.name() for _ in range(3000)]
email = [f"{user_name.replace('.','').replace(' ','.')}@{faker.domain_name()}" for user_name in name]
age = [faker.random_int(min=18, max=70) for _ in range(3000)]
gender = [faker.random_element(elements=('Male', 'Female')) for _ in range(3000)]
postcode = postcodes[:3000]  # Ensure the length matches
country = ['Australia'] * 3000

users = pd.DataFrame({
    'User ID': id,  # Unpack the list
    'First Name': [name.split()[0] for name in name],
    'Last Name': [name.split()[1] for name in name],
    'Email': email,
    'Age': age,
    'Gender': gender,
    'Postcode': postcode,
    'Country': country
})

In [17]:
users.head()

Unnamed: 0,User ID,First Name,Last Name,Email,Age,Gender,Postcode,Country
0,9182230,Karen,Lewis,Karen.Lewis@moore.net,24,Male,2606,Australia
1,9490870,James,Wallace,James.Wallace@salas.biz,29,Male,2615,Australia
2,1462203,Michael,Foster,Michael.Foster@brewer.org.au,51,Male,2613,Australia
3,7131610,Michael,Walter,Michael.Walter@mason-benson.org,24,Female,2609,Australia
4,6787791,April,Campbell,April.Campbell@hill.edu,29,Female,2608,Australia


In [18]:
users.to_csv('../data/users.csv', index=False, header=True)