# LifeHarmony: An AI Recommender System for a Balanced Life
For starters, we generate a dataset that will help train our model to generalize to real user inputs; *Due to the unavailability of real datasets*.

## Our Dataset's Main Features:

- **Gender**: Helps refine recommendations based on demographic trends.

- **Age**: Aligns hobbies and suggestions with life stages.

- **Budget Allocated for Improving Life Balance**: Ensures recommendations are financially feasible.

- **Time Allocated for Improving Life Balance**: Fits suggestions within realistic time commitments.

- **Marital Status**: Helps prioritize areas like family or personal development.

- **Occupation**: Tailors suggestions to work-life balance needs.

- **Hobbies**: Identifies interests to recommend aligned activities.

- **Introvert/Ambivert/Extrovert**: Personalizes recommendations to social comfort levels.


## Step 1: Generating Behavioral Features Using Logical Probabilities

We apply specific logic to simulate real-world data and relationships. For instance:

- **Age vs. Budget**: Younger users tend to have lower budgets, while older users are likely to have higher disposable incomes.

- **Occupation vs. Time Allocated**: Full-time workers typically have less time for hobbies compared to students or freelancers.

- **Personality vs. Hobbies**: Introverts are more inclined towards solo activities (e.g., reading, writing), while extroverts prefer group activities (e.g., socializing, team sports).

- **Age vs. Marital Status**: Younger individuals are more likely to be single, while older individuals tend to be married.

These logical probabilities ensure that the generated dataset aligns with plausible, real-world patterns.


In [21]:
import pandas as pd
import numpy as np

# Set random seed for reproducibility
np.random.seed(42)

# Number of samples to generate
num_samples = 20000

# Generate User IDs with zero-padded numbers
user_ids = [f"307{str(i+1).zfill(5)}" for i in range(num_samples)]


# Generate age: Normal distribution centered around 30-40
ages = np.random.normal(loc=35, scale=10, size=num_samples).astype(int)
ages = np.clip(ages, 12, 70)  # Limit ages between 12 and 70

# Generate gender: 50-50 distribution
genders = np.random.choice(["Male", "Female"], size=num_samples)

# Generate marital status based on age (younger users are more likely single)
marital_status = [
    np.random.choice(["Single", "Married"], p=[0.7, 0.3]) if age < 30 else
    np.random.choice(["Single", "Married"], p=[0.3, 0.7])
    for age in ages
]

# Generate occupation
occupations = np.random.choice(
    ["Student", "Freelancer", "Part-time", "Full-time", "Unemployed"],
    size=num_samples,
    p=[0.2, 0.2, 0.1, 0.4, 0.1]
)

# Generate budget based on occupation (AED ranges)
budget = [
    np.random.randint(0, 1000) if occ in ["Student", "Unemployed", "Part-time"] else
    np.random.randint(0, 2000) if occ in ["Freelancer"] else
    np.random.randint(3000, 7000)
    for occ in occupations
]

# Generate time allocated for personal growth based on occupation
time_allocated = [
    np.random.randint(1, 5) if occ in ["Full-time"] else
    np.random.randint(3, 10) if occ in ["Part-time", "Freelancer"] else
    np.random.randint(5, 15)
    for occ in occupations
]

# Generate introvert/extrovert tendencies
personality = np.random.choice(
    ["Introvert", "Ambivert", "Extrovert"],
    size=num_samples,
    p=[0.4, 0.3, 0.3]
)

# Generate hobbies based on personality
hobbies = [
    np.random.choice(["Reading", "Art", "Writing", "Socializing", "Exercise"])
    if pers == "Ambivert" else
    np.random.choice(["Reading", "Writing", "Art"])
    if pers == "Introvert" else
    np.random.choice(["Socializing", "Exercise", "Art"])
    for pers in personality
]

# Combine into a DataFrame
data = pd.DataFrame({
    "UserID": user_ids,
    "Age": ages,
    "Gender": genders,
    "Marital Status": marital_status,
    "Occupation": occupations,
    "Budget": budget,
    "Time Allocated (hrs/week)": time_allocated,
    "Personality": personality,
    "Hobbies": hobbies
})

# Save the complete dataset
data.to_excel("generated_datasets/1_generated_dataset.xlsx", index=False)
print("Dataset generated and saved as 'generated_dataset.xlsx'.")

Dataset generated and saved as 'generated_dataset.xlsx'.


## Step 2: Generating Wheel of Life Features Using Logical Relations

In this step, we utilize **previous behavioral features** (e.g., hobbies, occupation, marital status, budget) to determine how users might **rank their current life domains** (e.g., Career, Financial, Physical). These rankings reflect the user's current satisfaction or focus in each area.

### Process:
1. **Generate Current Ratings**:
   - Each domain's current rating is derived logically based on user attributes:
     - Example: If the user enjoys exercise, their **Physical** rating will be higher.
     - Example: A full-time employee might have a higher **Career** rating.

2. **Generate Future Goal Ratings**:
   - For each domain, a random value is generated that is **greater than or equal to the current rating**.
   - This represents the user's aspirational goal for that domain.

3. **Calculate the Gap**:
   - The difference between the future goal and current ratings (`Goal - Current`) is computed for each domain.

4. **Normalize Gaps**:
   - The gaps are normalized across all domains to identify **relative imbalances**.

5. **Classify Priorities**:
   - Based on the normalized gaps:
     - **High Priority**: Domains with the largest gaps.
     - **Medium Priority**: Domains with moderate gaps.
     - **Low Priority**: Domains with the smallest gaps.

### Outcome:
By identifying the priorities for each domain, this step helps highlight areas where the user should focus to achieve a balanced life.


In [22]:
import pandas as pd
import numpy as np

# Load the previously generated dataset
data = pd.read_excel("generated_datasets/1_generated_dataset.xlsx")

# List of features for balanced life
life_features = ["Career", "Financial", "Spiritual", "Physical", "Intellectual", "Family", "Social", "Fun"]

# Function to calculate Current Ratings based on user attributes
def calculate_current_ratings(row):
    ratings = {}

    # Career
    if row["Occupation"] == "Full-time":
        ratings["Career"] = np.random.randint(7, 11)  # Upper limit is exclusive in randint
    elif row["Occupation"] in ["Part-time", "Freelancer"]:
        ratings["Career"] = np.random.randint(4, 8)
    else:
        ratings["Career"] = np.random.randint(0, 5)

    # Financial
    ratings["Financial"] = min(int(row["Budget"] / 7000 * 10), 10)  # Scale budget to 0-10 and cast to int

    # Physical
    if row["Hobbies"] == "Exercise":
        ratings["Physical"] = np.random.randint(7, 11)
    else:
        ratings["Physical"] = np.random.randint(3, 7)

    # Intellectual
    if row["Hobbies"] in ["Reading", "Writing"]:
        ratings["Intellectual"] = np.random.randint(7, 11)
    else:
        ratings["Intellectual"] = np.random.randint(3, 7)

    # Family
    if row["Marital Status"] == "Married":
        ratings["Family"] = np.random.randint(6, 11)
    else:
        ratings["Family"] = np.random.randint(3, 7)

    # Social
    if row["Personality"] == "Extrovert":
        ratings["Social"] = np.random.randint(7, 11)
    elif row["Personality"] == "Ambivert":
        ratings["Social"] = np.random.randint(4, 8)
    else:
        ratings["Social"] = np.random.randint(2, 6)

    # Fun
    if row["Hobbies"] in ["Art", "Socializing", "Exercise"]:
        ratings["Fun"] = np.random.randint(6, 11)
    else:
        ratings["Fun"] = np.random.randint(3, 7)

    # Spiritual (random unless specific logic is available)
    ratings["Spiritual"] = np.random.randint(3, 8)

    return ratings

# Generate Current Ratings for each row
balanced_data = []
for _, row in data.iterrows():
    # Calculate current ratings
    current_ratings = calculate_current_ratings(row)
    
    # Generate goal ratings (always >= current ratings)
    goal_ratings = {key: min(current_ratings[key] + np.random.randint(0, 6), 10) for key in current_ratings}
    
    # Calculate gaps
    gaps = {key: goal_ratings[key] - current_ratings[key] for key in current_ratings}

    # Add small noise to gaps to avoid exact equality (if needed)
    gaps = {key: gaps[key] + np.random.uniform(-0.001, 0.001) for key in gaps}

    # Normalize gaps to determine relative priorities
    total_gap = sum(gaps.values())
    normalized_gaps = {key: gaps[key] / total_gap for key in gaps} if total_gap > 0 else {key: 0 for key in gaps}

    # Assign priorities
    if len(set(normalized_gaps.values())) == 1:  # Check if all gaps are equal
        # Apply a tie-breaking rule (e.g., alphabetical order of domain names)
        sorted_domains = sorted(normalized_gaps.keys())
        priorities = {
            key: "High" if key == sorted_domains[0] else "Medium" if key == sorted_domains[1] else "Low"
            for key in normalized_gaps
        }
    else:
        # Regular priority assignment using thresholds
        thresholds = np.percentile(list(normalized_gaps.values()), [33.33, 66.67])
        priorities = {
            key: "Low" if normalized_gaps[key] <= thresholds[0] else
                 "Medium" if normalized_gaps[key] <= thresholds[1] else
                 "High"
            for key in normalized_gaps
        }

    # Append data for the current row
    balanced_data.append({
        **row.to_dict(),
        **{f"{feature}_Current": current_ratings[feature] for feature in life_features},
        **{f"{feature}_Goal": goal_ratings[feature] for feature in life_features},
        **{f"{feature}_Gap": gaps[feature] for feature in life_features},
        **{f"{feature}_Priority": priorities[feature] for feature in life_features}
    })

# Create a new DataFrame with extended data
extended_data = pd.DataFrame(balanced_data)

# Save the extended dataset to a new Excel file
extended_data.to_excel("generated_datasets/2_generated_dataset.xlsx", index=False)

print("Extended dataset generated and saved as '2_generated_dataset.xlsx'.")


Extended dataset generated and saved as '2_generated_dataset.xlsx'.


## Step 3: Removing Irrelevant Features

In this step, we remove the columns containing `Current`, `Goal`, and `Gap` values for each life domain, as they are not relevant to the decision-making process. Our focus is solely on the **priority/importance** assigned to each class.

This simplifies the dataset while retaining only the necessary information for further analysis.


In [23]:
import pandas as pd

# Load the dataset with all features
dataset_path = "generated_datasets/2_generated_dataset.xlsx"
data = pd.read_excel(dataset_path)

# Identify and drop columns related to Current, Goal, and Gap
columns_to_remove = [col for col in data.columns if any(keyword in col for keyword in ["_Current", "_Goal", "_Gap"])]
data_cleaned = data.drop(columns=columns_to_remove, errors="ignore")  # Remove matching columns

# Save the cleaned dataset
cleaned_dataset_path = "generated_datasets/3_generated_dataset.xlsx"
data_cleaned.to_excel(cleaned_dataset_path, index=False)

print(f"Cleaned dataset saved as '{cleaned_dataset_path}'.")

Cleaned dataset saved as 'generated_datasets/3_generated_dataset.xlsx'.


## Step 4: Generating Preliminary Recommendations Using Logical Relations

This step uses **logical relations and statements** to generate recommendations for each user. The approach analyzes high-priority domain based on the user's attributes, such as personality, hobbies, budget, and occupation, to assign:
- A **list of recommendations** for high-priority domains.

These recommendations are determined using predefined logical rules that match the user's behavioral features, ensuring actionable insights tailored to their needs and preferences. This approach serves as a foundation for future enhancements and decision-making models.


In [24]:
import pandas as pd

# Load the dataset
generated_dataset_path = "generated_datasets/3_generated_dataset.xlsx"
data = pd.read_excel(generated_dataset_path)

# Define a function for logical recommendations
def generate_suggestions(row):
    recommendations = []

    # Career
    if row["Career_Priority"] == "High":
        if row["Occupation"] == "Full-time":
            recommendations.append("Regularly update your resume")
            recommendations.append("Allocate 1-3 hours a week into improving your career-related skills")
        elif row["Occupation"] == "Student":
            recommendations.append("Start an online course")
            recommendations.append("Look into internship opportunities")
        else:
            recommendations.append("Allocate 1-3 hours a week into improving your career-related skills")
        # Nested logic for Social Priority
        if row["Social_Priority"] == "High":
            recommendations.append("Attend career fairs and networking events")
        elif row["Social_Priority"] == "Medium":
            recommendations.append("Explore online professional networking platforms")
    
    # Financial
    if row["Financial_Priority"] == "High":
        if row["Budget"] < 1000:
            recommendations.append("Start tracking weekly expenses using an app")
            recommendations.append("Build an emergency savings fund")
        elif row["Occupation"] == "Full-time" and row["Age"] > 40:
            recommendations.append("Automate savings for retirement")
            recommendations.append("Start investing in retirement plans")
        else:
            recommendations.append("Build financial awareness")
            recommendations.append("Track monthly expenses")
    
    # Spiritual
    if row["Spiritual_Priority"] == "High":
        recommendations.append("Allocate time for meditation or reflective practices")
        recommendations.append("Spend time in nature")
    
    # Physical
    if row["Physical_Priority"] == "High":
        if row["Age"] >= 50:
            recommendations.append("Consider low-impact fitness plans like walking or swimming")
        elif row["Budget"] > 1500:
            recommendations.append("Consider getting a gym membership")
        else:
            recommendations.append("Exercise regularly")
            recommendations.append("Use affordable fitness apps")
        recommendations.append("Eat healthy foods")
        recommendations.append("Attend regular preventative medical checkups")
        # Nested logic for Fun Priority
        if row["Fun_Priority"] == "High":
            recommendations.append("Try community hikes or dance classes")
    
    # Intellectual
    if row["Intellectual_Priority"] == "High":
        if row["Hobbies"] == "Reading":
            recommendations.append("Join book clubs")
            recommendations.append("Set personal reading goals")
        elif row["Hobbies"] == "Writing":
            recommendations.append("Participate in writing competitions")
            recommendations.append("Join writing circles")
        recommendations.append("Listen to podcasts or audiobooks about topics of interest")
    
    # Family
    if row["Family_Priority"] == "High":
        if row["Marital Status"] == "Married":
            recommendations.append("Plan family outings")
            recommendations.append("Host weekly family game nights")
        else:
            recommendations.append("Visit family members regularly")
            recommendations.append("Plan family gatherings")

    # Social
    if row["Social_Priority"] == "High":
        if row["Hobbies"] == "Art":
            recommendations.append("Join group art classes")
            recommendations.append("Visit art fairs with friends")
        if row["Personality"] == "Extrovert":
            recommendations.append("Attend social events or meetups")
        elif row["Personality"] in ["Introvert", "Ambivert"]:
            recommendations.append("Join small, interest-based groups")
            recommendations.append("Participate in online communities")
    
    # Fun
    if row["Fun_Priority"] == "High":
        if row["Personality"] == "Extrovert":
            recommendations.append("Participate in group adventure activities or social game nights")
            recommendations.append("Join social recreational activities")
        elif row["Personality"] in ["Introvert", "Ambivert"]:
            recommendations.append("Engage in hobbies you enjoy")
            recommendations.append("Watch your favorite shows")
        if row["Hobbies"] == "Art":
            recommendations.append("Explore an art workshop")
    
    return recommendations

# Apply the function to the dataset
data["Recommendations"] = data.apply(lambda row: generate_suggestions(row), axis=1)

# Save the updated dataset with recommendations
output_path = "generated_datasets/4_generated_dataset_with_recommendations.xlsx"
data.to_excel(output_path, index=False)

print(f"Recommendations saved to: {output_path}")


Recommendations saved to: generated_datasets/4_generated_dataset_with_recommendations.xlsx
