<a href="https://colab.research.google.com/github/MASKED-GOD/ikigai/blob/main/ikigai.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The dataset generation code you provided constructs a synthetic dataset simulating factors influencing an individual's Ikigai, which is the intersection of what they love, what they are good at, what the world needs, and what they can be rewarded for. Below is a breakdown of the factors in the dataset and the underlying equations or logic used.

### Factors in the Dataset

1. **Motivation**:
   - Possible Values: ['Helping others', 'Achieving success', 'Creative expression', 'Learning new things', 'Solving problems', 'Exploring the world']
   - **Weighting**: Emphasizes helping and learning more heavily, reflecting common motivational themes in young adults.

2. **Skill**:
   - Possible Values: ['Communication', 'Analytical thinking', 'Leadership', 'Craftsmanship', 'Empathy', 'Creativity']
   - **Weighting**: Focuses on empathy and analytical thinking, recognizing these as valuable skills for many roles.

3. **Environment**:
   - Possible Values: ['A structured and organized setting', 'A fast-paced, challenging workplace', 'A creative and artistic space', 'A supportive and collaborative team', 'A flexible and independent work style', 'An outdoor, nature-oriented environment']
   - **Weighting**: Favors supportive and flexible environments, reflecting preferences for workplace culture.

4. **Impact**:
   - Possible Values: ['Making a difference', 'Innovating new technologies', 'Promoting equality and justice', 'Fostering education and knowledge', 'Protecting the environment', 'Creating beauty or art']
   - **Weighting**: Places higher importance on making a difference, a common aspiration among individuals seeking fulfillment.

5. **Known For**:
   - Possible Values: ['Empathy', 'Creativity', 'Knowledge and expertise', 'Leadership and influence', 'Contributions to society', 'Adventurous spirit']
   - **Weighting**: Highlights empathy and creativity, aligning with roles that require strong interpersonal skills or artistic talent.

6. **Ikigai**:
   - Possible Values: ['Social Worker', 'Engineer', 'Artist', 'Educator', 'Environmentalist', 'Leader']
   - This is the target variable determined by combining the previous factors.

7. **Age**:
   - A numeric value randomly generated between 15 and 25 to provide context about the demographic of individuals in the dataset.

### Equations and Logic Used in the Dataset Generation

1. **Weighted Random Selection**: Each factor uses weighted probabilities to randomly select values, which reflects the psychological weightings based on age-appropriate interests:
   ```python
   np.random.choice(motivations, num_students, p=motivation_weights)
   ```

2. **Ikigai Determination Logic**: The `determine_ikigai` function uses conditional logic to assign an Ikigai role based on the combinations of the other factors. This follows a set of rules:
   ```python
   if row['motivation'] == 'Helping others' and row['known_for'] == 'Empathy':
       return 'Social Worker'
   ```

3. **Fallback Assignment**: If no strong match is found based on the defined criteria, a random Ikigai role is selected from the predefined roles to ensure every entry has a corresponding Ikigai:
   ```python
   return np.random.choice(ikigai_roles)
   ```

### Summary of Key Points

- **Synthetic Data Creation**: The dataset was generated randomly with specific weights to simulate real-world interests and inclinations among young individuals.
- **Hierarchical Decision-Making**: The assignment of Ikigai is based on a decision tree logic that prioritizes certain combinations of attributes.
- **Diversity in Roles**: By including a fallback option, the dataset captures a broader range of potential Ikigai roles, enhancing variability and realism.

Overall, this methodology not only creates a dataset for modeling purposes but also reflects real-life considerations in personal fulfillment and career choice.

# **Dataset**

In [None]:
import pandas as pd
import numpy as np
import random

# Seed for reproducibility
np.random.seed(42)

# Define the options and weights for each factor
motivations = ['Helping others', 'Achieving success', 'Creative expression', 'Learning new things', 'Solving problems', 'Exploring the world']
skills = ['Communication', 'Analytical thinking', 'Leadership', 'Craftsmanship', 'Empathy', 'Creativity']
environments = ['A structured and organized setting', 'A fast-paced, challenging workplace', 'A creative and artistic space', 'A supportive and collaborative team', 'A flexible and independent work style', 'An outdoor, nature-oriented environment']
impacts = ['Making a difference', 'Innovating new technologies', 'Promoting equality and justice', 'Fostering education and knowledge', 'Protecting the environment', 'Creating beauty or art']
known_fors = ['Empathy', 'Creativity', 'Knowledge and expertise', 'Leadership and influence', 'Contributions to society', 'Adventurous spirit']
ikigai_roles = ['Social Worker', 'Engineer', 'Artist', 'Educator', 'Environmentalist', 'Leader']

# Psychological weightings based on age-appropriate interests and inclinations
motivation_weights = [0.25, 0.2, 0.15, 0.2, 0.1, 0.1]  # Emphasis on helping and learning
skill_weights = [0.15, 0.25, 0.1, 0.05, 0.3, 0.15]  # Higher weight for empathy and analytical thinking
environment_weights = [0.15, 0.2, 0.1, 0.25, 0.2, 0.1]  # Preference for supportive and flexible environments
impact_weights = [0.3, 0.1, 0.15, 0.2, 0.15, 0.1]  # Higher weight for making a difference
known_for_weights = [0.3, 0.2, 0.15, 0.1, 0.15, 0.1]  # Emphasis on empathy and creativity

# Randomly generate rows of data
num_students = 1000  # Number of samples

data = {
    'motivation': np.random.choice(motivations, num_students, p=motivation_weights),
    'skill': np.random.choice(skills, num_students, p=skill_weights),
    'environment': np.random.choice(environments, num_students, p=environment_weights),
    'impact': np.random.choice(impacts, num_students, p=impact_weights),
    'known_for': np.random.choice(known_fors, num_students, p=known_for_weights),
    'age': np.random.randint(15, 26, num_students)  # Age range from 15 to 25
}

# Convert to DataFrame
df = pd.DataFrame(data)

# Function to determine 'Ikigai' based on combined factors
def determine_ikigai(row):
    if row['motivation'] == 'Helping others' and row['known_for'] == 'Empathy':
        return 'Social Worker'
    elif row['skill'] == 'Analytical thinking' and row['impact'] == 'Innovating new technologies':
        return 'Engineer'
    elif row['motivation'] == 'Creative expression' and row['environment'] == 'A creative and artistic space':
        return 'Artist'
    elif row['impact'] == 'Fostering education and knowledge' and row['known_for'] == 'Knowledge and expertise':
        return 'Educator'
    elif row['impact'] == 'Protecting the environment' and row['known_for'] == 'Contributions to society':
        return 'Environmentalist'
    elif row['motivation'] == 'Achieving success' and row['skill'] == 'Leadership':
        return 'Leader'
    else:
        return np.random.choice(ikigai_roles)  # Random assignment if no strong match

# Apply the function to assign 'Ikigai'
df['ikigai'] = df.apply(determine_ikigai, axis=1)

# Save to CSV
df.to_csv('ikigai_dataset.csv', index=False)

print("Dataset saved as 'ikigai_dataset.csv'")

# Display the generated dataset
df.head(10)

Dataset saved as 'ikigai_dataset.csv'


Unnamed: 0,motivation,skill,environment,impact,known_for,age,ikigai
0,Achieving success,Analytical thinking,"A fast-paced, challenging workplace",Fostering education and knowledge,Knowledge and expertise,17,Educator
1,Exploring the world,Craftsmanship,"A fast-paced, challenging workplace",Protecting the environment,Contributions to society,22,Environmentalist
2,Learning new things,Creativity,"An outdoor, nature-oriented environment",Making a difference,Contributions to society,21,Engineer
3,Creative expression,Empathy,"A fast-paced, challenging workplace",Fostering education and knowledge,Empathy,17,Social Worker
4,Helping others,Empathy,"A fast-paced, challenging workplace",Fostering education and knowledge,Empathy,20,Social Worker
5,Helping others,Empathy,A flexible and independent work style,Protecting the environment,Empathy,18,Social Worker
6,Helping others,Empathy,A creative and artistic space,Creating beauty or art,Creativity,25,Educator
7,Solving problems,Empathy,A flexible and independent work style,Making a difference,Creativity,19,Educator
8,Learning new things,Analytical thinking,A structured and organized setting,Fostering education and knowledge,Leadership and influence,25,Engineer
9,Learning new things,Leadership,A supportive and collaborative team,Making a difference,Empathy,19,Environmentalist


# **# Machine Learning Code**

### Methodology

The methodology for the Ikigai prediction model involves several key steps, which include data preparation, feature engineering, model training, and evaluation. Here’s a detailed outline of the methodology:

#### 1. Data Collection
The dataset used for this project is `ikigai_dataset.csv`, which contains various features related to the concept of Ikigai, including:
- **Motivation**: What motivates the individual.
- **Skill**: The skills possessed by the individual.
- **Environment**: The preferred work environment.
- **Impact**: The impact the individual wants to make.
- **Known For**: Attributes or qualities the individual is known for.
- **Ikigai**: The target variable, representing the role the individual should pursue (e.g., Social Worker, Engineer, Artist).

#### 2. Data Preprocessing
- **Loading the Dataset**: The dataset is loaded using `pandas`.
- **Handling Categorical Data**: Categorical variables (motivation, skill, environment, impact, known for) are one-hot encoded to convert them into a format suitable for machine learning algorithms.
- **Splitting the Dataset**: The dataset is divided into training and testing sets using an 80/20 split.

#### 3. Model Selection
A **Random Forest Classifier** is selected for this task due to its robustness, ability to handle categorical features, and effectiveness in classification tasks. The model is initialized and trained using the training dataset.

#### 4. Model Training
- The Random Forest model is trained on the training dataset using the following steps:
  - Feature Selection: The features are selected based on their relevance to the target variable.
  - Training: The model learns from the training data.

#### 5. Model Evaluation
- The model's performance is evaluated using the testing set.
- Metrics such as accuracy, confusion matrix, and classification report are generated to assess the model's effectiveness.

#### 6. Visualization
- **Confusion Matrix**: A confusion matrix is generated to visualize the model's predictions against the actual labels.
- **Feature Importance**: A bar plot is created to show the importance of each feature in the model's decision-making process.

#### 7. Deployment
The model is deployed using a Flask web application, allowing users to input their motivations, skills, environments, impacts, and known for attributes through a user-friendly interface. The model predicts the Ikigai role and displays the results along with visualizations.

### Algorithm

The algorithm used for the prediction is based on the **Random Forest Classifier**. The core steps are as follows:

1. **Input Features**: The user inputs five categorical features (motivation, skill, environment, impact, known for).
2. **Data Transformation**: The input data is transformed using one-hot encoding to create a binary representation of the categorical features.
3. **Model Prediction**: The trained Random Forest model makes a prediction based on the encoded features.
4. **Output**: The predicted Ikigai role is returned to the user along with visualizations of the confusion matrix and feature importance.

### Equations Used

While the Random Forest algorithm itself is complex and involves ensemble learning techniques, some fundamental concepts and equations related to decision trees (the building blocks of Random Forest) can be mentioned:

1. **Gini Impurity**: Used to measure the impurity of a node in decision trees:
   

2. **Entropy**: Another metric for measuring impurity:
   

3. **Information Gain**: The reduction in entropy or Gini impurity achieved by partitioning the data based on an attribute:
   

4. **Random Forest Ensemble Prediction**: The final prediction from a Random Forest is obtained by aggregating the predictions from all individual trees:
  

### Conclusion
This methodology outlines the steps taken to develop and deploy the Ikigai prediction model, detailing the data processing, algorithm selection, model evaluation, and deployment stages. The use of Random Forest provides a robust solution for predicting the ideal role based on personal attributes and aspirations.

In [None]:
# Install required packages
!pip install flask-ngrok joblib pandas scikit-learn ngrok

Collecting flask-ngrok
  Downloading flask_ngrok-0.0.25-py3-none-any.whl.metadata (1.8 kB)
Collecting ngrok
  Downloading ngrok-1.4.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (19 kB)
Downloading flask_ngrok-0.0.25-py3-none-any.whl (3.1 kB)
Downloading ngrok-1.4.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m25.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: ngrok, flask-ngrok
Successfully installed flask-ngrok-0.0.25 ngrok-1.4.0


In [None]:
!pip install pyngrok

Collecting pyngrok
  Downloading pyngrok-7.2.1-py3-none-any.whl.metadata (8.3 kB)
Downloading pyngrok-7.2.1-py3-none-any.whl (22 kB)
Installing collected packages: pyngrok
Successfully installed pyngrok-7.2.1


In [None]:
# Import necessary libraries
from flask import Flask, request, jsonify, render_template_string
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt
import joblib
from io import BytesIO
import base64
from pyngrok import ngrok

# Authenticate ngrok (replace with your actual token if necessary)
ngrok.set_auth_token("2oR5JbGwQPG9oJZxf0QD9LPRWf4_66Axn3reB8VeMjKm8V3Mg")

# Initialize Flask app
app = Flask(__name__)

# Load and prepare the dataset
df = pd.read_csv('ikigai_dataset.csv')

# Separate features and target variable
X = df[['motivation', 'skill', 'environment', 'impact', 'known_for']]
y = df['ikigai']

# One-hot encode categorical features
X_encoded = pd.get_dummies(X)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_encoded, y, test_size=0.2, random_state=42)

# Initialize and train the RandomForest model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

# Predictions
y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)

# Calculate accuracy
train_accuracy = accuracy_score(y_train, y_train_pred)
test_accuracy = accuracy_score(y_test, y_test_pred)

# Save the model to a file
joblib.dump(model, 'ikigai_model.pkl')

# Function to create a confusion matrix plot
def create_confusion_matrix_plot():
    conf_matrix = confusion_matrix(y_test, y_test_pred)
    plt.figure(figsize=(8, 6))
    sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=model.classes_, yticklabels=model.classes_)
    plt.title("Confusion Matrix")
    plt.xlabel("Predicted Label")
    plt.ylabel("True Label")

    buf = BytesIO()
    plt.savefig(buf, format='png')
    plt.close()
    buf.seek(0)
    return base64.b64encode(buf.read()).decode('utf-8')

# Function to create a feature importance plot
def create_feature_importance_plot():
    feature_importances = model.feature_importances_
    features = X_encoded.columns
    importance_df = pd.DataFrame({'Feature': features, 'Importance': feature_importances})
    importance_df = importance_df.sort_values(by='Importance', ascending=False)

    plt.figure(figsize=(10, 8))
    sns.barplot(x='Importance', y='Feature', data=importance_df)
    plt.title("Feature Importance")
    plt.xlabel("Importance")
    plt.ylabel("Feature")

    buf = BytesIO()
    plt.savefig(buf, format='png')
    plt.close()
    buf.seek(0)
    return base64.b64encode(buf.read()).decode('utf-8')

# Define the HTML content for the web interface
@app.route('/')
def index():
    # Define the options for each category
    motivations = ['Helping others', 'Achieving success', 'Creative expression', 'Learning new things', 'Solving problems', 'Exploring the world']
    skills = ['Communication', 'Analytical thinking', 'Leadership', 'Craftsmanship', 'Empathy', 'Creativity']
    environments = ['A structured and organized setting', 'A fast-paced, challenging workplace', 'A creative and artistic space', 'A supportive and collaborative team', 'A flexible and independent work style', 'An outdoor, nature-oriented environment']
    impacts = ['Making a difference', 'Innovating new technologies', 'Promoting equality and justice', 'Fostering education and knowledge', 'Protecting the environment', 'Creating beauty or art']
    known_fors = ['Empathy', 'Creativity', 'Knowledge and expertise', 'Leadership and influence', 'Contributions to society', 'Adventurous spirit']

    # Generate the HTML for dropdown options
    motivations_options = ''.join([f'<option value="{motivation}">{motivation}</option>' for motivation in motivations])
    skills_options = ''.join([f'<option value="{skill}">{skill}</option>' for skill in skills])
    environments_options = ''.join([f'<option value="{environment}">{environment}</option>' for environment in environments])
    impacts_options = ''.join([f'<option value="{impact}">{impact}</option>' for impact in impacts])
    known_fors_options = ''.join([f'<option value="{known_for}">{known_for}</option>' for known_for in known_fors])

    # Return the HTML content
    return f'''
    <!DOCTYPE html>
    <html>
    <head>
        <title>Ikigai Predictor</title>
        <style>
            body {{ font-family: Arial, sans-serif; }}
            label, select, button {{ display: block; margin: 10px 0; }}
        </style>
    </head>
    <body>
        <h1>Ikigai Prediction</h1>
        <form id="predictionForm">
            <label>Motivation:</label>
            <select id="motivation" required>
                <option value="">Select</option>
                {motivations_options}
            </select>

            <label>Skill:</label>
            <select id="skill" required>
                <option value="">Select</option>
                {skills_options}
            </select>

            <label>Environment:</label>
            <select id="environment" required>
                <option value="">Select</option>
                {environments_options}
            </select>

            <label>Impact:</label>
            <select id="impact" required>
                <option value="">Select</option>
                {impacts_options}
            </select>

            <label>Known For:</label>
            <select id="known_for" required>
                <option value="">Select</option>
                {known_fors_options}
            </select>

            <button type="button" onclick="submitForm()">Predict</button>
        </form>
        <div id="result"></div>
        <div id="confusionMatrix"></div>
        <div id="featureImportance"></div>

        <script>
            async function submitForm() {{
                const data = {{
                    motivation: document.getElementById('motivation').value,
                    skill: document.getElementById('skill').value,
                    environment: document.getElementById('environment').value,
                    impact: document.getElementById('impact').value,
                    known_for: document.getElementById('known_for').value
                }};

                const response = await fetch('/predict', {{
                    method: 'POST',
                    headers: {{ 'Content-Type': 'application/json' }},
                    body: JSON.stringify(data)
                }});
                const result = await response.json();

                document.getElementById('result').innerHTML = 'Predicted Ikigai: ' + result.prediction;
                document.getElementById('confusionMatrix').innerHTML = '<img src="data:image/png;base64,' + result.confusion_matrix + '">';
                document.getElementById('featureImportance').innerHTML = '<img src="data:image/png;base64,' + result.feature_importance + '">';
            }}
        </script>
    </body>
    </html>
    '''

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json()
    # Transform the input data to match the model's expected input
    input_df = pd.DataFrame([data])
    input_encoded = pd.get_dummies(input_df).reindex(columns=X_encoded.columns, fill_value=0)

    # Make prediction
    prediction = model.predict(input_encoded)[0]

    # Create visualizations
    confusion_matrix_plot = create_confusion_matrix_plot()
    feature_importance_plot = create_feature_importance_plot()

    return jsonify({
        'prediction': prediction,
        'confusion_matrix': confusion_matrix_plot,
        'feature_importance': feature_importance_plot
    })

# Start the app
if __name__ == '__main__':
    public_url = ngrok.connect(5000)
    print(" * Public URL:", public_url)
    app.run(port=5000)

 * Public URL: NgrokTunnel: "https://1f82-34-106-206-64.ngrok-free.app" -> "http://localhost:5000"
 * Serving Flask app '__main__'
 * Debug mode: off


 * Running on http://127.0.0.1:5000
INFO:werkzeug:[33mPress CTRL+C to quit[0m
INFO:werkzeug:127.0.0.1 - - [08/Nov/2024 15:19:58] "GET / HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [08/Nov/2024 15:19:58] "[33mGET /favicon.ico HTTP/1.1[0m" 404 -
INFO:werkzeug:127.0.0.1 - - [08/Nov/2024 15:20:06] "POST /predict HTTP/1.1" 200 -
INFO:werkzeug:127.0.0.1 - - [08/Nov/2024 15:20:08] "POST /predict HTTP/1.1" 200 -
