<h4>The three datasets provide detailed information on user registration, daily check-ins, and onboarding details. Here's how they relate to the proposed pipeline:<h2>

<h5>Registration Data:<h5>
Includes user IDs, contact information, notification preferences, and registration dates.
Useful for understanding the app engagement context.

<h5>Daily Check-In Data:<h5>
Tracks daily responses such as menstrual flow, body temperature, symptoms, sleep, exercise, and mood.
Serves as the primary source for daily score computation.

<h5>Onboarding Data:<h5>
Captures user demographics, health habits, and baseline metrics (e.g., cycle length, sleep, activity levels).
Useful for personalizing scores and setting baselines.


In [84]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

onb = pd.read_csv("Onboarding_Data.csv")
reg = pd.read_csv("Registration_Data.csv")
checkin = pd.read_csv("Check-In-data.csv")


In [85]:
onb.head()

Unnamed: 0,user_id,record_date,app_purpose,birth_year,gender,height,weight,birth_control,regularity,last_period_date,period_len_est,cycle_len_est,menstrual_pain,products,wear_device,sleep_levels,activity_levels,sedenary_levels
0,user_1,2024-02-20,Pregnancy Tracking,2012,Female,5.3,91.5,,Very Regular,2024-09-21,5,Fewer than 21 days,Very severe pain (disrupts daily activities),Sanitary Pads,Yes,7-8 hours,30 min to 1 hour,5 hours to 8 hours
1,user_2,2024-07-18,Sex Life Improvement,1960,Male,5.9,194.5,Birth Control Ring,Somewhat Irregular,2024-11-16,8,21-35 days,Moderate pain (requires over-the-counter pain ...,Period Underwear,Yes,Fewer than 6 hours,Less than 30 min,More than 8 hours
2,user_3,2024-06-08,Mood & Behavior Tracking,1972,Female,6.1,110.1,Patch,Very Regular,2024-09-14,5,21-35 days,Mild pain (manageable without medication),Menstrual Sponges,No,7-8 hours,Less than 30 min,Fewer than 5 hours
3,user_4,2024-06-10,Cycle & Symptom Monitoring,2003,Non-binary,6.0,243.1,Birth Control Pills,Very Regular,2024-10-24,6,21-35 days,Very severe pain (disrupts daily activities),Sanitary Pads,Yes,Fewer than 6 hours,More than 1 hour,More than 8 hours
4,user_5,2024-06-07,Period Tracking,1982,Female,5.0,219.3,Non-Hormonal IUD,Very Regular,2024-11-26,5,More than 35 days,Mild pain (manageable without medication),Sanitary Pads,No,6-7 hours,More than 1 hour,More than 8 hours


In [86]:
reg.head()

Unnamed: 0,user_id,user_name,phone_num,email,password,confirm_pw,notification_type,notification_period,register_date
0,user_1,user_name_1,8022732000.0,user1@example.com,password123,password123,Email,,2024-01-09
1,user_2,user_name_2,9887720000.0,user2@example.com,password123,password123,Email,Weekly,2024-01-20
2,user_3,user_name_3,1239456000.0,user3@example.com,password123,password123,App Notifications,Monthly,2024-01-21
3,user_4,user_name_4,2299959000.0,user4@example.com,password123,password123,,,2024-01-30
4,user_5,user_name_5,,user5@example.com,password123,password123,Email,Monthly,2024-01-04


In [87]:
checkin.head()

Unnamed: 0,user_id,date,Are you on your period today?,Season,Menstrual Flow Level,Discharge,Body Temperature,Symptoms,Menstrual Pain Level,Sexual Intercourse,Birth Control,Sleep Duration,Rested Feeling,Exercise Today,Exercise Intensity,Mood,Notes on Mood
0,user_1,2024-12-09,Yes,Fall,Heavy,Blood-tinged discharge (spotting),36.7,Sleep Disturbances,Moderate pain (required over-the-counter pain ...,No,Patch,6-7 hours,Somewhat rested,Yes,"Light (e.g., walking, stretching)",Content,I did not sleep well so I feel drained
1,user_1,2024-12-10,Yes,Fall,Heavy,Watery discharge,36.5,Headaches or Migraines,Mild pain (manageable without medication),Yes,Fertility Awareness Based Methods,More than 8 hours,Somewhat rested,No,"Mixed (a combination of light, moderate, and v...",Energetic,I did not sleep well so I feel drained
2,user_1,2024-12-11,Yes,Fall,Moderate,Thick or clumpy discharge,37.0,Joint or Muscle Pain,Moderate pain (required over-the-counter pain ...,No,Other,6-7 hours,Well rested,Yes,,Lonely,Very happy today
3,user_1,2024-12-12,Yes,Fall,Light,Dry or very little discharge,37.7,Mood Swings,Severe pain (required prescription pain medica...,No,Patch,More than 8 hours,Not well rested,No,"Light (e.g., walking, stretching)",Irritable,Very happy today
4,user_1,2024-11-25,Yes,Fall,Very heavy,Sticky or tacky discharge,37.4,Bloating,Severe pain (required prescription pain medica...,Yes,Birth Control Ring,7-8 hours,Very well rested,Yes,"Light (e.g., walking, stretching)",Neutral,Very happy today


In [88]:
def preprocess_data(data):
    """
    Preprocess the check-in data for scoring.
    This includes cleaning, normalizing, and encoding metrics into numerical forms.
    """
    # Convert "Yes/No" to binary in relevant columns
    data["On Period"] = data["Are you on your period today?"].map({"Yes": 1, "No": 0})
    data["Sexual Intercourse"] = data["Sexual Intercourse"].map({"Yes": 1, "No": 0})
    data["Exercise Today"] = data["Exercise Today"].map({"Yes": 1, "No": 0})
    
    # Map "Sleep Duration" to numerical scores
    sleep_mapping = {
        "More than 8 hours": 100,
        "7-8 hours": 80,
        "6-7 hours": 60,
        "Less than 5 hours": 20
    }
    data["Sleep Score"] = data["Sleep Duration"].map(sleep_mapping)

    # Map "Menstrual Pain Level" to numerical scores
    pain_mapping = {
        "No pain or discomfort": 100,
        "Mild pain (manageable without medication)": 80,
        "Moderate pain (required over-the-counter pain medication)": 50,
        "Severe pain (required prescription pain medication)": 20
    }
    data["Pain Score"] = data["Menstrual Pain Level"].map(pain_mapping)

    # Map "Mood" to numerical scores
    mood_mapping = {
        "Energetic": 100,
        "Content": 80,
        "Neutral": 60,
        "Irritable": 40,
        "Lonely": 20
    }
    data["Mood Score"] = data["Mood"].map(mood_mapping)

    # Convert "Exercise Intensity" to numerical scores
    exercise_mapping = {
        "None": 0,
        "Light (e.g., walking, stretching)": 50,
        "Moderate (e.g., jogging, yoga)": 70,
        "Vigorous (e.g., running, HIIT)": 100,
        "Mixed (a combination of light, moderate, and vigorous activities)": 80
    }
    data["Exercise Score"] = data["Exercise Intensity"].map(exercise_mapping)

    # Fill missing values with defaults or calculated medians
    data.fillna({"Body Temperature": data["Body Temperature"].median()}, inplace=True)

    return data


In [89]:
def calculate_user_score(row):
    """
    Calculate the user score for a single row of data.
    """
    # Define weights for metrics
    weights = {
        "On Period": 0.1,
        "Mood Score": 0.25,
        "Sleep Score": 0.1,
        "Pain Score": 0.2,
        "Exercise Score": 0.15,
        "Sexual Intercourse": 0.1,
        "Body Temperature": 0.1  # Placeholder; could be normalized further
    }

    # Calculate weighted score
    score = (
        weights["On Period"] * row["On Period"] +
        weights["Mood Score"] * row["Mood Score"] +
        weights["Sleep Score"] * row["Sleep Score"] +
        weights["Pain Score"] * row["Pain Score"] +
        weights["Exercise Score"] * row["Exercise Score"] +
        weights["Sexual Intercourse"] * row["Sexual Intercourse"] +
        weights["Body Temperature"] * (row["Body Temperature"] - 36.5) * 10  # Scaled deviation
    )
    return max(50, min(100, score))  # Clamp score between 50 and 100


In [90]:
# Preprocess the data
preprocessed_data = preprocess_data(checkin)

# Calculate user scores
preprocessed_data["User Score"] = preprocessed_data.apply(calculate_user_score, axis=1)



In [91]:
preprocessed_data.head()

Unnamed: 0,user_id,date,Are you on your period today?,Season,Menstrual Flow Level,Discharge,Body Temperature,Symptoms,Menstrual Pain Level,Sexual Intercourse,...,Exercise Today,Exercise Intensity,Mood,Notes on Mood,On Period,Sleep Score,Pain Score,Mood Score,Exercise Score,User Score
0,user_1,2024-12-09,Yes,Fall,Heavy,Blood-tinged discharge (spotting),36.7,Sleep Disturbances,Moderate pain (required over-the-counter pain ...,0,...,1,"Light (e.g., walking, stretching)",Content,I did not sleep well so I feel drained,1,60.0,50.0,80.0,50.0,50.0
1,user_1,2024-12-10,Yes,Fall,Heavy,Watery discharge,36.5,Headaches or Migraines,Mild pain (manageable without medication),1,...,0,"Mixed (a combination of light, moderate, and v...",Energetic,I did not sleep well so I feel drained,1,100.0,80.0,100.0,80.0,63.2
2,user_1,2024-12-11,Yes,Fall,Moderate,Thick or clumpy discharge,37.0,Joint or Muscle Pain,Moderate pain (required over-the-counter pain ...,0,...,1,,Lonely,Very happy today,1,60.0,50.0,20.0,,100.0
3,user_1,2024-12-12,Yes,Fall,Light,Dry or very little discharge,37.7,Mood Swings,Severe pain (required prescription pain medica...,0,...,0,"Light (e.g., walking, stretching)",Irritable,Very happy today,1,100.0,20.0,40.0,50.0,50.0
4,user_1,2024-11-25,Yes,Fall,Very heavy,Sticky or tacky discharge,37.4,Bloating,Severe pain (required prescription pain medica...,1,...,1,"Light (e.g., walking, stretching)",Neutral,Very happy today,1,80.0,20.0,60.0,50.0,50.0


In [92]:
preprocessed_data[["user_id","date","User Score"]]

Unnamed: 0,user_id,date,User Score
0,user_1,2024-12-09,50.0
1,user_1,2024-12-10,63.2
2,user_1,2024-12-11,100.0
3,user_1,2024-12-12,50.0
4,user_1,2024-11-25,50.0
...,...,...,...
1082,user_50,2024-11-10,100.0
1083,user_50,2024-10-26,100.0
1084,user_50,2024-10-26,100.0
1085,user_50,2024-11-23,100.0


In [93]:
## export for the further process
preprocessed_data.to_csv("check_in_with_user_score.csv", index=False)