Purpose of this notebook is to group the urls by url hierarchy, according to the second level of the programmes (after `/programmes/`)

Please check that there's a folder named `program-sub-pages` in this `notebooks` folder in order to store the created excels later on.

Please run `get_program_subpages.ipynb` first to get the relevant excel file to run this notebook.

Note:
- Do not worry about the squiggly line under the dataframe variables, as I have defined the variables using global(). As long as the cells have been ran, there should be no issue.

<hr>

### Setup

In [31]:
import os
import pandas as pd

In [32]:
folder_path = 'program-sub-pages'
if not os.path.exists(folder_path):
    # If it doesn't exist, create the folder
    os.makedirs(folder_path)
    print(f"Folder created: {folder_path}")
else:
    print(f"Folder already exists: {folder_path}")

Folder already exists: program-sub-pages


Load Clean Dataset of Program Sub-pages URL

In [80]:
df = pd.read_excel(r"program-sub-pages\cleaned_programSubpages.xlsx", sheet_name=0)
# For more information & checking on the rows, you can look this (Optional):
# df_cols = df = pd.read_excel(r"program-sub-pages\cleaned_programSubpages.xlsx", sheet_name=1)
df

Unnamed: 0,id,title,full_url,extracted_content_body,content_category
0,1434688,Screen for Life - National Health Screening Pr...,https://www.healthhub.sg/programmes/Screen_for...,We note that some users have experienced issue...,programs
1,1434612,"Live Well, Age Well Programme",https://www.healthhub.sg/programmes/AAP,"By 7 February 2023, all users must perform a o...",programs
2,1434660,The Healthy 365 App,https://www.healthhub.sg/programmes/healthyliving,[https://go.gov.sg/useh365] [https://go.gov.s...,programs
3,1434580,"Eat, Drink, Shop Healthy Challenge",https://www.healthhub.sg/programmes/eat-drink-...,Menu [#clear]\n\nMenu [#clear]\n\nMenu [...,programs
4,1434657,Great things start when you MOVE IT!,https://www.healthhub.sg/programmes/LetsMoveIt,[https://go.gov.sg/h365-moveit]\n\nGet Moving ...,programs
...,...,...,...,...,...
359,1435127,Types of diabetes | Diabetes Hub,https://www.healthhub.sg/programmes/diabetes-h...,3 BEaTMS TO BEAT DIABETES [/programmes/diabete...,program-sub-pages
360,1435167,Hypoglycaemia | Diabetes Hub,https://www.healthhub.sg/programmes/diabetes-h...,3 BES TO BEAT DIABETES [/programmes/diabetes-h...,program-sub-pages
361,1468676,Program Sub level 1,https://www.healthhub.sg/programmes/1test/sya-...,HealthHub\nRelaxation-ExerciseHealthhub\nList ...,program-sub-pages
362,1435215,If you have coronary heart disease | Diabetes Hub,https://www.healthhub.sg/programmes/diabetes-h...,3 BES TO BEAT DIABETES [/programmes/diabetes-h...,program-sub-pages


<hr>

### Extraction of second level of URL hierarchy

In [81]:
# Duplicate df to keep original df unchanged
df_new = df

In [82]:
# Create new col to store extracted portion of url after '/programmes/'
df_new['secondLvl'] = df_new['full_url'].str.split('/programmes/').str[1].str.split('/').str[0]
df_new

Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
0,1434688,Screen for Life - National Health Screening Pr...,https://www.healthhub.sg/programmes/Screen_for...,We note that some users have experienced issue...,programs,Screen_for_Life
1,1434612,"Live Well, Age Well Programme",https://www.healthhub.sg/programmes/AAP,"By 7 February 2023, all users must perform a o...",programs,AAP
2,1434660,The Healthy 365 App,https://www.healthhub.sg/programmes/healthyliving,[https://go.gov.sg/useh365] [https://go.gov.s...,programs,healthyliving
3,1434580,"Eat, Drink, Shop Healthy Challenge",https://www.healthhub.sg/programmes/eat-drink-...,Menu [#clear]\n\nMenu [#clear]\n\nMenu [...,programs,eat-drink-shop-healthy-challenge
4,1434657,Great things start when you MOVE IT!,https://www.healthhub.sg/programmes/LetsMoveIt,[https://go.gov.sg/h365-moveit]\n\nGet Moving ...,programs,LetsMoveIt
...,...,...,...,...,...,...
359,1435127,Types of diabetes | Diabetes Hub,https://www.healthhub.sg/programmes/diabetes-h...,3 BEaTMS TO BEAT DIABETES [/programmes/diabete...,program-sub-pages,diabetes-hub-v2
360,1435167,Hypoglycaemia | Diabetes Hub,https://www.healthhub.sg/programmes/diabetes-h...,3 BES TO BEAT DIABETES [/programmes/diabetes-h...,program-sub-pages,diabetes-hub-v2
361,1468676,Program Sub level 1,https://www.healthhub.sg/programmes/1test/sya-...,HealthHub\nRelaxation-ExerciseHealthhub\nList ...,program-sub-pages,1test
362,1435215,If you have coronary heart disease | Diabetes Hub,https://www.healthhub.sg/programmes/diabetes-h...,3 BES TO BEAT DIABETES [/programmes/diabetes-h...,program-sub-pages,diabetes-hub-v2


<hr>

### Data Understanding

- No. of unique values for `secondLvl` = No. of unique groups

In [83]:
df_new

Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
0,1434688,Screen for Life - National Health Screening Pr...,https://www.healthhub.sg/programmes/Screen_for...,We note that some users have experienced issue...,programs,Screen_for_Life
1,1434612,"Live Well, Age Well Programme",https://www.healthhub.sg/programmes/AAP,"By 7 February 2023, all users must perform a o...",programs,AAP
2,1434660,The Healthy 365 App,https://www.healthhub.sg/programmes/healthyliving,[https://go.gov.sg/useh365] [https://go.gov.s...,programs,healthyliving
3,1434580,"Eat, Drink, Shop Healthy Challenge",https://www.healthhub.sg/programmes/eat-drink-...,Menu [#clear]\n\nMenu [#clear]\n\nMenu [...,programs,eat-drink-shop-healthy-challenge
4,1434657,Great things start when you MOVE IT!,https://www.healthhub.sg/programmes/LetsMoveIt,[https://go.gov.sg/h365-moveit]\n\nGet Moving ...,programs,LetsMoveIt
...,...,...,...,...,...,...
359,1435127,Types of diabetes | Diabetes Hub,https://www.healthhub.sg/programmes/diabetes-h...,3 BEaTMS TO BEAT DIABETES [/programmes/diabete...,program-sub-pages,diabetes-hub-v2
360,1435167,Hypoglycaemia | Diabetes Hub,https://www.healthhub.sg/programmes/diabetes-h...,3 BES TO BEAT DIABETES [/programmes/diabetes-h...,program-sub-pages,diabetes-hub-v2
361,1468676,Program Sub level 1,https://www.healthhub.sg/programmes/1test/sya-...,HealthHub\nRelaxation-ExerciseHealthhub\nList ...,program-sub-pages,1test
362,1435215,If you have coronary heart disease | Diabetes Hub,https://www.healthhub.sg/programmes/diabetes-h...,3 BES TO BEAT DIABETES [/programmes/diabetes-h...,program-sub-pages,diabetes-hub-v2


Check the unqiue values for second level

In [84]:
unique_secondLvl = df_new['secondLvl'].unique()
unique_counts = df_new['secondLvl'].value_counts()
print(f"Number of unique values: {len(unique_secondLvl)}")
print("\nCounts for each unique value:\n", unique_counts)

Number of unique values: 67

Counts for each unique value:
 secondLvl
parent-hub                     72
MindSG                         71
diabetes-hub                   41
diabetes-hub-v2                25
pressure-injury                14
                               ..
breast-cancer-screening         1
healthy-pregnancy               1
colorectal-cancer-screening     1
personal-health-eservices       1
Copy-of-1test                   1
Name: count, Length: 66, dtype: int64


In [85]:
unique_secondLvl

array(['Screen_for_Life', 'AAP', 'healthyliving',
       'eat-drink-shop-healthy-challenge', 'LetsMoveIt', 'hsg',
       'parent-hub', 'nsc', 'nutrition-hub', 'vaccinate', 'diabetes-hub',
       'school_dental_programme', 'healthhub-rewards',
       'diabetes-mellitus', 'IQuit', 'hiv-prevention',
       'children-health-ehb', 'MindSG', 'korangok',
       'sexually-transmitted-infections', 'use-antibiotics-right',
       'healthy-workplace-ecosystem', 'MoveIt', 'health-promoting-malls',
       'strokehub', 'hygiene', 'StayWell', 'justcheckingin',
       'howareyoudoing', 'EmotionsExplorer', 'preventive-health',
       'pressure-injury', 'Fight_The_Spread', 'cervical-cancer-screening',
       'breast-cancer-screening', 'healthy-pregnancy',
       'colorectal-cancer-screening', 'personal-health-eservices',
       'nsc-participating-outlets', 'lose_to_win', 'community-challenge',
       'testestnew', 'healthytimeout', 'go-hunkle',
       'nsc-silver-challenge', 'lower-calorie-meals-can-be-

- Hence, for now 66 groups
- There is one url with nan
- Further analysis to understand groups, especially the smaller groups of 1/2

<hr>

### Form Grouped Dataframs

In [86]:
count = 1
grpList = []
# Create respective dfs for each unique group in `secondLvl`
for group in unique_secondLvl:
    if isinstance(group, str):
        # Replace invalid characters for variable name
        grpName = group.replace('-', '_').replace(' ', '_').replace('nan', 'NaN')
    else:
        grpName = 'NaN'
    # Create new df for each group
    globals()[f"df_{grpName}"] = df_new[df_new['secondLvl'] == group]
    print(f"{count}. Created df_{grpName}")
    grpList.append(f"df_{grpName}")
    count += 1

# Arrange grpList in descending order of count
grpList = sorted(grpList, key=lambda x: len(globals()[x]), reverse=True)
print("\n", grpList)

1. Created df_Screen_for_Life
2. Created df_AAP
3. Created df_healthyliving
4. Created df_eat_drink_shop_healthy_challenge
5. Created df_LetsMoveIt
6. Created df_hsg
7. Created df_parent_hub
8. Created df_nsc
9. Created df_nutrition_hub
10. Created df_vaccinate
11. Created df_diabetes_hub
12. Created df_school_dental_programme
13. Created df_healthhub_rewards
14. Created df_diabetes_mellitus
15. Created df_IQuit
16. Created df_hiv_prevention
17. Created df_children_health_ehb
18. Created df_MindSG
19. Created df_korangok
20. Created df_sexually_transmitted_infections
21. Created df_use_antibiotics_right
22. Created df_healthy_workplace_ecosystem
23. Created df_MoveIt
24. Created df_health_promoting_malls
25. Created df_strokehub
26. Created df_hygiene
27. Created df_StayWell
28. Created df_justcheckingin
29. Created df_howareyoudoing
30. Created df_EmotionsExplorer
31. Created df_preventive_health
32. Created df_pressure_injury
33. Created df_Fight_The_Spread
34. Created df_cervical_ca

<hr>

### Group Understanding

In [37]:
# Loop through list & display  first few rows of each DataFrame
for df_name in grpList:
    index = grpList.index(df_name)
    print(f"{index+1}. DataFrame: {df_name}")
    globals()[df_name].info()
    display(globals()[df_name].head())
    print("*"*150)

1. DataFrame: df_parent_hub
<class 'pandas.core.frame.DataFrame'>
Index: 72 entries, 6 to 273
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      72 non-null     int64 
 1   title                   72 non-null     object
 2   full_url                72 non-null     object
 3   extracted_content_body  72 non-null     object
 4   content_category        72 non-null     object
 5   secondLvl               72 non-null     object
dtypes: int64(1), object(5)
memory usage: 3.9+ KB


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
6,1453161,Parent Hub,https://www.healthhub.sg/programmes/parent-hub,"As parents, you can inspire your children to a...",programs,parent-hub
63,1435221,Parent Hub: Student Immunisation And Screening,https://www.healthhub.sg/programmes/parent-hub...,CHILD IMMUNISATION AND SCREENING SERVICES\nThe...,program-sub-pages,parent-hub
77,1434753,Parent Hub: 0-2 Years - Healthy Diet,https://www.healthhub.sg/programmes/parent-hub...,fo \n MEAL TIMES To view all content in this s...,program-sub-pages,parent-hub
79,1435359,Parent Hub: 0-2 Years,https://www.healthhub.sg/programmes/parent-hub...,Here's How to Team Up with Your Wife for Paren...,program-sub-pages,parent-hub
80,1435231,"Parent Hub: Student Health Centre, Dental Centre",https://www.healthhub.sg/programmes/parent-hub...,STUDENT HEALTH CENTRE AND STUDENT DENTAL CENTR...,program-sub-pages,parent-hub


******************************************************************************************************************************************************
2. DataFrame: df_MindSG
<class 'pandas.core.frame.DataFrame'>
Index: 71 entries, 18 to 310
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      71 non-null     int64 
 1   title                   71 non-null     object
 2   full_url                71 non-null     object
 3   extracted_content_body  71 non-null     object
 4   content_category        71 non-null     object
 5   secondLvl               71 non-null     object
dtypes: int64(1), object(5)
memory usage: 3.9+ KB


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
18,1434644,MindSG,https://www.healthhub.sg/programmes/MindSG,Redirecting to Discover [/programmes/MindSG/Di...,programs,MindSG
61,1434919,MindSG,https://www.healthhub.sg/programmes/MindSG/Car...,Caring for Ourselves\nSleeping Well\nSelect th...,program-sub-pages,MindSG
67,1434871,MindSG,https://www.healthhub.sg/programmes/MindSG/Dis...,Are we giving the right support?\nLearn how we...,program-sub-pages,MindSG
70,1435236,MindSG,https://www.healthhub.sg/programmes/MindSG/Sle...,[#helplines] [#helplines]\n\nSleep Tracking F...,program-sub-pages,MindSG
71,1435243,MindSG,https://www.healthhub.sg/programmes/MindSG/See...,Seeking Support\nChoose what youd like to read...,program-sub-pages,MindSG


******************************************************************************************************************************************************
3. DataFrame: df_diabetes_hub
<class 'pandas.core.frame.DataFrame'>
Index: 41 entries, 10 to 303
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      41 non-null     int64 
 1   title                   41 non-null     object
 2   full_url                41 non-null     object
 3   extracted_content_body  41 non-null     object
 4   content_category        41 non-null     object
 5   secondLvl               41 non-null     object
dtypes: int64(1), object(5)
memory usage: 2.2+ KB


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
10,1434652,3 Be's To Beat Diabetes | Diabetes Hub,https://www.healthhub.sg/programmes/diabetes-hub,3 BES TO BEAT DIABETES [/programmes/diabetes-h...,programs,diabetes-hub
11,1434614,Diabetes Hub: Guide to Managing Diabetes,https://www.healthhub.sg/programmes/diabetes-hub,3 BES TO BEAT DIABETES [/programmes/diabetes-h...,programs,diabetes-hub
106,1435164,Self-monitoring of blood sugar | Diabetes Hub,https://www.healthhub.sg/programmes/diabetes-h...,3 BES TO BEAT DIABETES [/programmes/diabetes-h...,program-sub-pages,diabetes-hub
147,1435281,Diabetes Hub: Guide to Managing Diabetes,https://www.healthhub.sg/programmes/diabetes-h...,3 BES TO BEAT DIABETES [/programmes/diabetes-h...,program-sub-pages,diabetes-hub
148,1435129,Be Aware - What is diabetes,https://www.healthhub.sg/programmes/diabetes-h...,3 BES TO BEAT DIABETES [/programmes/diabetes-h...,program-sub-pages,diabetes-hub


******************************************************************************************************************************************************
4. DataFrame: df_diabetes_hub_v2
<class 'pandas.core.frame.DataFrame'>
Index: 25 entries, 322 to 362
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      25 non-null     int64 
 1   title                   25 non-null     object
 2   full_url                25 non-null     object
 3   extracted_content_body  25 non-null     object
 4   content_category        25 non-null     object
 5   secondLvl               25 non-null     object
dtypes: int64(1), object(5)
memory usage: 1.4+ KB


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
322,1435175,Travelling overseas | Diabetes Hub,https://www.healthhub.sg/programmes/diabetes-h...,3 BEaTMS TO BEAT DIABETES [/programmes/diabete...,program-sub-pages,diabetes-hub-v2
324,1435159,Sleeping well | Diabetes Hub,https://www.healthhub.sg/programmes/diabetes-h...,3 BES TO BEAT DIABETES [/programmes/diabetes-h...,program-sub-pages,diabetes-hub-v2
325,1435136,Healthy eating | Diabetes Hub,https://www.healthhub.sg/programmes/diabetes-h...,3 BES TO BEAT DIABETES [/programmes/diabetes-h...,program-sub-pages,diabetes-hub-v2
327,1435156,Stigma of diabetes | Diabetes Hub,https://www.healthhub.sg/programmes/diabetes-h...,3 BES TO BEAT DIABETES [/programmes/diabetes-h...,program-sub-pages,diabetes-hub-v2
328,1435147,Avoid smoking and drinking | Diabetes Hub,https://www.healthhub.sg/programmes/diabetes-h...,3 BEaTMS TO BEAT DIABETES [/programmes/diabete...,program-sub-pages,diabetes-hub-v2


******************************************************************************************************************************************************
5. DataFrame: df_nsc
<class 'pandas.core.frame.DataFrame'>
Index: 14 entries, 7 to 276
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      14 non-null     int64 
 1   title                   14 non-null     object
 2   full_url                14 non-null     object
 3   extracted_content_body  14 non-null     object
 4   content_category        14 non-null     object
 5   secondLvl               14 non-null     object
dtypes: int64(1), object(5)
memory usage: 784.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
7,1453207,National Steps Challenge™,https://www.healthhub.sg/programmes/nsc,Previous [#nscMastheadCarousel] Next [#nscMast...,programs,nsc
64,1434809,National Steps Challenge™,https://www.healthhub.sg/programmes/nsc/tracke...,< Previous [#nscMastheadCarousel] > Next [#nsc...,program-sub-pages,nsc
76,1434811,National Steps Challenge™,https://www.healthhub.sg/programmes/nsc/support/,Previous [#nscMastheadCarousel] > Next [#nscMa...,program-sub-pages,nsc
99,1434803,National Steps Challenge™,https://www.healthhub.sg/programmes/nsc/corpor...,< /> Previous [#nscMastheadCarousel] > Next [#...,program-sub-pages,nsc
183,1434969,National Steps Challenge™,https://www.healthhub.sg/programmes/nsc/corpor...,<Previous [#nscMastheadCarousel] >Next [#nscMa...,program-sub-pages,nsc


******************************************************************************************************************************************************
6. DataFrame: df_pressure_injury
<class 'pandas.core.frame.DataFrame'>
Index: 14 entries, 32 to 253
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      14 non-null     int64 
 1   title                   14 non-null     object
 2   full_url                14 non-null     object
 3   extracted_content_body  14 non-null     object
 4   content_category        14 non-null     object
 5   secondLvl               14 non-null     object
dtypes: int64(1), object(5)
memory usage: 784.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
32,1434628,Pressure Injury Hub,https://www.healthhub.sg/programmes/pressure-i...,Welcome to\n\nPressure Injury Hub\nA Pressure ...,programs,pressure-injury
144,1435123,Pressure Injury Hub,https://www.healthhub.sg/programmes/pressure-i...,Menu - Pressure Injury Hub\n- Preventing Press...,program-sub-pages,pressure-injury
180,1435275,Pressure Injury Hub,https://www.healthhub.sg/programmes/pressure-i...,Menu - Pressure Injury Hub\n- Preventing Press...,program-sub-pages,pressure-injury
185,1435149,Pressure Injury Hub,https://www.healthhub.sg/programmes/pressure-i...,Menu - Pressure Injury Hub\n- Preventing Press...,program-sub-pages,pressure-injury
231,1435116,Pressure Injury Hub,https://www.healthhub.sg/programmes/pressure-i...,Menu - Pressure Injury Hub\n- Preventing Press...,program-sub-pages,pressure-injury


******************************************************************************************************************************************************
7. DataFrame: df_nutrition_hub
<class 'pandas.core.frame.DataFrame'>
Index: 12 entries, 8 to 279
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      12 non-null     int64 
 1   title                   12 non-null     object
 2   full_url                12 non-null     object
 3   extracted_content_body  12 non-null     object
 4   content_category        12 non-null     object
 5   secondLvl               12 non-null     object
dtypes: int64(1), object(5)
memory usage: 672.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
8,1434656,Nutrition Facts and Tips on Eating Healthy,https://www.healthhub.sg/programmes/nutrition-hub,Menu [#clear]\n\nMenu [#clear]\n\nMenu [...,programs,nutrition-hub
65,1435018,Reduce Your Salt And Sugar Intake,https://www.healthhub.sg/programmes/nutrition-...,Menu [#] [#clear]\n\nMenu [#] [#clear]\n\n...,program-sub-pages,nutrition-hub
66,1472348,Nutri-Grade,https://www.healthhub.sg/programmes/nutrition-...,Menu [#clear]\n\nMenu [#clear]\n\nMenu [...,program-sub-pages,nutrition-hub
68,1435021,Make Healthy Food & Grocery Choices,https://www.healthhub.sg/programmes/nutrition-...,Menu [#clear]\n\nResources\nPick up useful t...,program-sub-pages,nutrition-hub
69,1435017,Nutritious Foods For A Healthy Diet,https://www.healthhub.sg/programmes/nutrition-...,Menu [#clear]\n\nEat More\nEat more nutritio...,program-sub-pages,nutrition-hub


******************************************************************************************************************************************************
8. DataFrame: df_LetsMoveIt
<class 'pandas.core.frame.DataFrame'>
Index: 9 entries, 4 to 258
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      9 non-null      int64 
 1   title                   9 non-null      object
 2   full_url                9 non-null      object
 3   extracted_content_body  9 non-null      object
 4   content_category        9 non-null      object
 5   secondLvl               9 non-null      object
dtypes: int64(1), object(5)
memory usage: 504.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
4,1434657,Great things start when you MOVE IT!,https://www.healthhub.sg/programmes/LetsMoveIt,[https://go.gov.sg/h365-moveit]\n\nGet Moving ...,programs,LetsMoveIt
62,1480345,Great things start when you MOVE IT!,https://www.healthhub.sg/programmes/LetsMoveIt...,[https://go.gov.sg/useh365] [https://go.gov.s...,program-sub-pages,LetsMoveIt
100,1435259,Great things start when you MOVE IT!,https://www.healthhub.sg/programmes/LetsMoveIt...,Your quick guide to book an event\nFor a pdf v...,program-sub-pages,LetsMoveIt
113,1435239,Great things start when you MOVE IT!,https://www.healthhub.sg/programmes/LetsMoveIt...,Ready for a bigger challenge? Explore a variet...,program-sub-pages,LetsMoveIt
119,1435229,Great things start when you MOVE IT!,https://www.healthhub.sg/programmes/LetsMoveIt...,Well done on taking the first steps to an acti...,program-sub-pages,LetsMoveIt


******************************************************************************************************************************************************
9. DataFrame: df_AAP
<class 'pandas.core.frame.DataFrame'>
Index: 8 entries, 1 to 176
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      8 non-null      int64 
 1   title                   8 non-null      object
 2   full_url                8 non-null      object
 3   extracted_content_body  8 non-null      object
 4   content_category        8 non-null      object
 5   secondLvl               8 non-null      object
dtypes: int64(1), object(5)
memory usage: 448.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
1,1434612,"Live Well, Age Well Programme",https://www.healthhub.sg/programmes/AAP,"By 7 February 2023, all users must perform a o...",programs,AAP
85,1434907,"See, Hear & Eat Better",https://www.healthhub.sg/programmes/AAP/functi...,"Project Silver Screen is an affordable, nation...",program-sub-pages,AAP
109,1434895,7 Easy Exercises to an Active Lifestyle,https://www.healthhub.sg/programmes/AAP/easy-e...,Back to Healthy Ageing [/programmes/Healthy_Ag...,program-sub-pages,AAP
122,1434903,You can spot a stroke,https://www.healthhub.sg/programmes/AAP/stroke/,Back to Healthy Ageing [/programmes/Healthy_Ag...,program-sub-pages,AAP
137,1434905,Age Healthier When You Cook Right And Eat Smart,https://www.healthhub.sg/programmes/AAP/nutrit...,Back to Healthy Ageing [http://www.healthhub.s...,program-sub-pages,AAP


******************************************************************************************************************************************************
10. DataFrame: df_healthia
<class 'pandas.core.frame.DataFrame'>
Index: 7 entries, 57 to 363
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      7 non-null      int64 
 1   title                   7 non-null      object
 2   full_url                7 non-null      object
 3   extracted_content_body  7 non-null      object
 4   content_category        7 non-null      object
 5   secondLvl               7 non-null      object
dtypes: int64(1), object(5)
memory usage: 392.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
57,1434618,Healthia,https://www.healthhub.sg/programmes/healthia,[/programs/Lists/Programs/index.html]\n- Home\...,programs,healthia
323,1434805,Dr Kimberly Douglas,https://www.healthhub.sg/programmes/healthia/d...,[/programs/Lists/Program Sub Pages/index.html]...,program-sub-pages,healthia
333,1434777,About Us,https://www.healthhub.sg/programmes/healthia/a...,[/programs/Lists/Program Sub Pages/index.html]...,program-sub-pages,healthia
346,1434801,level-1,https://www.healthhub.sg/programmes/healthia/l...,athis is level 1 content,program-sub-pages,healthia
350,1434796,Our Team,https://www.healthhub.sg/programmes/healthia/t...,[/programs/Lists/Program Sub Pages/index.html]...,program-sub-pages,healthia


******************************************************************************************************************************************************
11. DataFrame: df_healthhub__parenting
<class 'pandas.core.frame.DataFrame'>
Index: 7 entries, 315 to 321
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      7 non-null      int64 
 1   title                   7 non-null      object
 2   full_url                7 non-null      object
 3   extracted_content_body  7 non-null      object
 4   content_category        7 non-null      object
 5   secondLvl               7 non-null      object
dtypes: int64(1), object(5)
memory usage: 392.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
315,1434831,Be Gentle With Yourself,https://www.healthhub.sg/programmes/healthhub-...,Excited to hold your baby in your arms but fee...,program-sub-pages,healthhub--parenting
316,1434921,Happy 1-Year Old!,https://www.healthhub.sg/programmes/healthhub-...,Your baby is 1 year old! YouaTMve made it!\nFo...,program-sub-pages,healthhub--parenting
317,1434889,Is Your Baby Throwing Things Around?,https://www.healthhub.sg/programmes/healthhub-...,Cannot determine when your baby is cute or nau...,program-sub-pages,healthhub--parenting
318,1434873,Separation Anxiety,https://www.healthhub.sg/programmes/healthhub-...,Preparing to go back to work? Help baby cope w...,program-sub-pages,healthhub--parenting
319,1434911,"Time For Baby, Time For Mummy And Daddy",https://www.healthhub.sg/programmes/healthhub-...,"Is your baby anxious around strangers, clingin...",program-sub-pages,healthhub--parenting


******************************************************************************************************************************************************
12. DataFrame: df_IQuit
<class 'pandas.core.frame.DataFrame'>
Index: 6 entries, 15 to 305
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      6 non-null      int64 
 1   title                   6 non-null      object
 2   full_url                6 non-null      object
 3   extracted_content_body  6 non-null      object
 4   content_category        6 non-null      object
 5   secondLvl               6 non-null      object
dtypes: int64(1), object(5)
memory usage: 336.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
15,1434716,I Quit Programme,https://www.healthhub.sg/programmes/IQuit,Toggle to find out more.\nI Quit\nProgramme Al...,programs,IQuit
83,1435084,Vape,https://www.healthhub.sg/programmes/IQuit/e-cig/,Toggle to find out more. I Quit Programme [/pr...,program-sub-pages,IQuit
116,1435096,Vape,https://www.healthhub.sg/programmes/IQuit/e-ci...,Toggle to find out more. I Quit Programme [/pr...,program-sub-pages,IQuit
239,1500502,Vape,https://www.healthhub.sg/programmes/IQuit/e-ci...,I Quit Programme [/programmes/88/IQuit/#home]\...,program-sub-pages,IQuit
304,1476504,Vape,https://www.healthhub.sg/programmes/IQuit/e-ci...,I Quit Programme [/programmes/88/IQuit/#home]\...,program-sub-pages,IQuit


******************************************************************************************************************************************************
13. DataFrame: df_korangok
<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, 19 to 245
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      5 non-null      int64 
 1   title                   5 non-null      object
 2   full_url                5 non-null      object
 3   extracted_content_body  5 non-null      object
 4   content_category        5 non-null      object
 5   secondLvl               5 non-null      object
dtypes: int64(1), object(5)
memory usage: 280.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
19,1434636,Korang OK?,https://www.healthhub.sg/programmes/korangok,[/programmes/korangok]\n- Home\n- Healthier Ch...,programs,korangok
161,1435115,Korang OK? - Fun Ways to Stay Active,https://www.healthhub.sg/programmes/korangok/f...,[/programmes/korangok/]\n- Home\n- Healthier C...,program-sub-pages,korangok
240,1435109,Korang OK? - Mental Well-being,https://www.healthhub.sg/programmes/korangok/m...,[/programmes/korangok/]\n- Home\n- Healthier C...,program-sub-pages,korangok
241,1435113,Korang OK? - IQuit for Good,https://www.healthhub.sg/programmes/korangok/q...,[/programmes/korangok/]\n- Home\n- Healthier C...,program-sub-pages,korangok
245,1435111,Korang OK? - Screen for Life,https://www.healthhub.sg/programmes/korangok/s...,[/programmes/korangok/]\n- Home\n- Healthier C...,program-sub-pages,korangok


******************************************************************************************************************************************************
14. DataFrame: df_MoveIt
<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, 23 to 311
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      5 non-null      int64 
 1   title                   5 non-null      object
 2   full_url                5 non-null      object
 3   extracted_content_body  5 non-null      object
 4   content_category        5 non-null      object
 5   secondLvl               5 non-null      object
dtypes: int64(1), object(5)
memory usage: 280.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
23,1434598,Great things start when you MOVE IT!,https://www.healthhub.sg/programmes/MoveIt,"By 7 February 2023, all users must perform a o...",programs,MoveIt
98,1435347,Great things start when you MOVE IT!,https://www.healthhub.sg/programmes/MoveIt/mov...,"Starting 12 July 2021, participants who do not...",program-sub-pages,MoveIt
123,1435340,Great things start when you MOVE IT!,https://www.healthhub.sg/programmes/MoveIt/mov...,"Starting 12 July 2021, participants who do not...",program-sub-pages,MoveIt
126,1434986,Great things start when you MOVE IT!,https://www.healthhub.sg/programmes/MoveIt/mov...,"Starting 12 July 2021, participants who do not...",program-sub-pages,MoveIt
311,1435344,Great things start when you MOVE IT!,https://www.healthhub.sg/programmes/MoveIt/mov...,In line with MOHs advisory on\n20 July 2021\no...,program-sub-pages,MoveIt


******************************************************************************************************************************************************
15. DataFrame: df_Screen_for_Life
<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, 0 to 93
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      4 non-null      int64 
 1   title                   4 non-null      object
 2   full_url                4 non-null      object
 3   extracted_content_body  4 non-null      object
 4   content_category        4 non-null      object
 5   secondLvl               4 non-null      object
dtypes: int64(1), object(5)
memory usage: 224.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
0,1434688,Screen for Life - National Health Screening Pr...,https://www.healthhub.sg/programmes/Screen_for...,We note that some users have experienced issue...,programs,Screen_for_Life
82,1435054,Screen for Life - National Health Screening Pr...,https://www.healthhub.sg/programmes/Screen_for...,START YOUR SCREENING JOURNEY HERE\n- 18 - 24 y...,program-sub-pages,Screen_for_Life
90,1435061,Screen for Life - National Health Screening Pr...,https://www.healthhub.sg/programmes/Screen_for...,We have received feedback from some members of...,program-sub-pages,Screen_for_Life
93,1435053,Screen for Life - National Health Screening Pr...,https://www.healthhub.sg/programmes/Screen_for...,Download our Screen for Life booklet: English ...,program-sub-pages,Screen_for_Life


******************************************************************************************************************************************************
16. DataFrame: df_ga_testing
<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, 48 to 348
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      3 non-null      int64 
 1   title                   3 non-null      object
 2   full_url                3 non-null      object
 3   extracted_content_body  3 non-null      object
 4   content_category        3 non-null      object
 5   secondLvl               3 non-null      object
dtypes: int64(1), object(5)
memory usage: 168.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
48,1434640,GA-Testing,https://www.healthhub.sg/programmes/ga-testing,Social media share\nShare\npersona A: [/progr...,programs,ga-testing
338,1435245,Persona B,https://www.healthhub.sg/programmes/ga-testing...,a-,program-sub-pages,ga-testing
348,1435247,Persona A,https://www.healthhub.sg/programmes/ga-testing...,a-,program-sub-pages,ga-testing


******************************************************************************************************************************************************
17. DataFrame: df_1test
<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, 51 to 361
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      3 non-null      int64 
 1   title                   3 non-null      object
 2   full_url                3 non-null      object
 3   extracted_content_body  3 non-null      object
 4   content_category        3 non-null      object
 5   secondLvl               3 non-null      object
dtypes: int64(1), object(5)
memory usage: 168.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
51,1468234,1test1test,https://www.healthhub.sg/programmes/1test,test program entry - 15/11/2023\nabc 123,programs,1test
337,1468237,1test-sub-sya333,https://www.healthhub.sg/programmes/1test/1tes...,Sub-program entry - 15/11/2023 - sya33333 [htt...,program-sub-pages,1test
361,1468676,Program Sub level 1,https://www.healthhub.sg/programmes/1test/sya-...,HealthHub\nRelaxation-ExerciseHealthhub\nList ...,program-sub-pages,1test


******************************************************************************************************************************************************
18. DataFrame: df_indian_outreach
<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, 247 to 313
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      3 non-null      int64 
 1   title                   3 non-null      object
 2   full_url                3 non-null      object
 3   extracted_content_body  3 non-null      object
 4   content_category        3 non-null      object
 5   secondLvl               3 non-null      object
dtypes: int64(1), object(5)
memory usage: 168.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
247,1434980,Take the first step with your loved ones,https://www.healthhub.sg/programmes/indian_out...,- Menu\n- Home\n- Healthy Eating\n- Physical A...,program-sub-pages,indian_outreach
312,1434979,Take the first step with your loved ones,https://www.healthhub.sg/programmes/indian_out...,- Menu\n- Home\n- Healthy Eating\n- Physical A...,program-sub-pages,indian_outreach
313,1434983,Take the first step with your loved ones,https://www.healthhub.sg/programmes/indian_out...,- Menu\n- Home\n- Healthy Eating\n- Physical A...,program-sub-pages,indian_outreach


******************************************************************************************************************************************************
19. DataFrame: df_10_fun_ways_to_get_active
<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, 343 to 354
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      3 non-null      int64 
 1   title                   3 non-null      object
 2   full_url                3 non-null      object
 3   extracted_content_body  3 non-null      object
 4   content_category        3 non-null      object
 5   secondLvl               3 non-null      object
dtypes: int64(1), object(5)
memory usage: 168.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
343,1435336,test page level 1 - 14-10-2020,https://www.healthhub.sg/programmes/10-fun-way...,athis is a test page,program-sub-pages,10-fun-ways-to-get-active
352,1435343,test page level 2-14-10-2020,https://www.healthhub.sg/programmes/10-fun-way...,atest page level 2,program-sub-pages,10-fun-ways-to-get-active
354,1435339,test page level 3-14-10-2020,https://www.healthhub.sg/programmes/10-fun-way...,atest page level 3,program-sub-pages,10-fun-ways-to-get-active


******************************************************************************************************************************************************
20. DataFrame: df_healthhub_rewards
<class 'pandas.core.frame.DataFrame'>
Index: 2 entries, 13 to 78
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      2 non-null      int64 
 1   title                   2 non-null      object
 2   full_url                2 non-null      object
 3   extracted_content_body  2 non-null      object
 4   content_category        2 non-null      object
 5   secondLvl               2 non-null      object
dtypes: int64(1), object(5)
memory usage: 112.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
13,1434642,HPB Rewards Programme,https://www.healthhub.sg/programmes/healthhub-...,"By 7 February 2023, all users must perform a o...",programs,healthhub-rewards
78,1434795,HPB Rewards Programme,https://www.healthhub.sg/programmes/healthhub-...,Frequently Asked Questions\n\nHPB Healthy 365 ...,program-sub-pages,healthhub-rewards


******************************************************************************************************************************************************
21. DataFrame: df_howareyoudoing
<class 'pandas.core.frame.DataFrame'>
Index: 2 entries, 29 to 202
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      2 non-null      int64 
 1   title                   2 non-null      object
 2   full_url                2 non-null      object
 3   extracted_content_body  2 non-null      object
 4   content_category        2 non-null      object
 5   secondLvl               2 non-null      object
dtypes: int64(1), object(5)
memory usage: 112.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
29,1434616,Take the first step with your loved ones,https://www.healthhub.sg/programmes/howareyoud...,[https://form.gov.sg/6448dcba7765f10012211b34]...,programs,howareyoudoing
202,1434977,Take the first step with your loved ones,https://www.healthhub.sg/programmes/howareyoud...,- Menu\n- Home\n- Healthy Eating\n- Physical A...,program-sub-pages,howareyoudoing


******************************************************************************************************************************************************
22. DataFrame: df_1test_221123
<class 'pandas.core.frame.DataFrame'>
Index: 2 entries, 52 to 340
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      2 non-null      int64 
 1   title                   2 non-null      object
 2   full_url                2 non-null      object
 3   extracted_content_body  2 non-null      object
 4   content_category        2 non-null      object
 5   secondLvl               2 non-null      object
dtypes: int64(1), object(5)
memory usage: 112.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
52,1469303,1test 221123,https://www.healthhub.sg/programmes/1test-221123,meal-log-tool-faqs [https://ch-api.healthhub.s...,programs,1test-221123
340,1482060,test72,https://www.healthhub.sg/programmes/1test-2211...,School Beverage List Apr 2024 As of Mar 2024 [...,program-sub-pages,1test-221123


******************************************************************************************************************************************************
23. DataFrame: df_1test_Program
<class 'pandas.core.frame.DataFrame'>
Index: 2 entries, 326 to 341
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      2 non-null      int64 
 1   title                   2 non-null      object
 2   full_url                2 non-null      object
 3   extracted_content_body  2 non-null      object
 4   content_category        2 non-null      object
 5   secondLvl               2 non-null      object
dtypes: int64(1), object(5)
memory usage: 112.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
326,1474295,1tes23,https://www.healthhub.sg/programmes/1test-Prog...,EDSH_Challenge_Event_Locations [https://ch-api...,program-sub-pages,1test-Program
341,1468228,1test programsub1,https://www.healthhub.sg/programmes/1test-Prog...,IQuit-Terms-and-Conditions_8Feb24 [https://ch-...,program-sub-pages,1test-Program


******************************************************************************************************************************************************
24. DataFrame: df_healthyliving
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 2 to 2
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
2,1434660,The Healthy 365 App,https://www.healthhub.sg/programmes/healthyliving,[https://go.gov.sg/useh365] [https://go.gov.s...,programs,healthyliving


******************************************************************************************************************************************************
25. DataFrame: df_eat_drink_shop_healthy_challenge
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 3 to 3
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
3,1434580,"Eat, Drink, Shop Healthy Challenge",https://www.healthhub.sg/programmes/eat-drink-...,Menu [#clear]\n\nMenu [#clear]\n\nMenu [...,programs,eat-drink-shop-healthy-challenge


******************************************************************************************************************************************************
26. DataFrame: df_hsg
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 5 to 5
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
5,1434650,Healthier SG | Enrol via HealthHub,https://www.healthhub.sg/programmes/hsg,"[#] \nHealthier You, with Healthier SG Enrol n...",programs,hsg


******************************************************************************************************************************************************
27. DataFrame: df_vaccinate
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 9 to 9
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
9,1434610,STAY ONE STEP AHEAD WITH VACCINATIONS,https://www.healthhub.sg/programmes/vaccinate,65 years old and above?\nClick here to book yo...,programs,vaccinate


******************************************************************************************************************************************************
28. DataFrame: df_school_dental_programme
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 12 to 12
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
12,1434596,Youth Preventive Dental Service,https://www.healthhub.sg/programmes/school_den...,YPDS also provides free basic dental services ...,programs,school_dental_programme


******************************************************************************************************************************************************
29. DataFrame: df_diabetes_mellitus
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 14 to 14
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
14,1434708,Let’s BEAT Diabetes,https://www.healthhub.sg/programmes/diabetes-m...,[#smitha-popup-close]\n\nCan I Really Reverse ...,programs,diabetes-mellitus


******************************************************************************************************************************************************
30. DataFrame: df_hiv_prevention
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 16 to 16
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
16,1434574,HIV and AIDS,https://www.healthhub.sg/programmes/hiv-preven...,- WHAT IS HIV AND AIDS?\n- HOW DOES HIV SPREAD...,programs,hiv-prevention


******************************************************************************************************************************************************
31. DataFrame: df_children_health_ehb
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 17 to 17
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
17,1434585,Children’s Health E-Services,https://www.healthhub.sg/programmes/children-h...,The Smarter Way LogBirth information and growt...,programs,children-health-ehb


******************************************************************************************************************************************************
32. DataFrame: df_sexually_transmitted_infections
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 20 to 20
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
20,1462073,Sexually Transmitted Infections,https://www.healthhub.sg/programmes/sexually-t...,[#hiv-and-aids_mobile_dropdown_menu]\n- WHAT I...,programs,sexually-transmitted-infections


******************************************************************************************************************************************************
33. DataFrame: df_use_antibiotics_right
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 21 to 21
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
21,1434592,Misuse of Antibiotics puts you at risk.,https://www.healthhub.sg/programmes/use-antibi...,Antibiotics do not work on viruses\nAntibiotic...,programs,use-antibiotics-right


******************************************************************************************************************************************************
34. DataFrame: df_healthy_workplace_ecosystem
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 22 to 22
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
22,1434662,Healthy Workplace Ecosystem,https://www.healthhub.sg/programmes/healthy-wo...,HEALTHY LIVING AT THE WORKPLACE\n\nSTAYING ACT...,programs,healthy-workplace-ecosystem


******************************************************************************************************************************************************
35. DataFrame: df_health_promoting_malls
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 24 to 24
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
24,1434666,Health Promoting Malls,https://www.healthhub.sg/programmes/health-pro...,Announcement:\n1. Mall Workout Cancellation Ka...,programs,health-promoting-malls


******************************************************************************************************************************************************
36. DataFrame: df_strokehub
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 25 to 25
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
25,1434588,Stroke Hub,https://www.healthhub.sg/programmes/strokehub,A Resource for\nStroke Survivors and\ntheir Ca...,programs,strokehub


******************************************************************************************************************************************************
37. DataFrame: df_hygiene
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 26 to 26
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
26,1434648,Get Hands-on With Health,https://www.healthhub.sg/programmes/hygiene,1. 2. \n3. 4. \n\nGet hands-on with health.\...,programs,hygiene


******************************************************************************************************************************************************
38. DataFrame: df_StayWell
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 27 to 27
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
27,1434624,Stay Well to Stay Strong,https://www.healthhub.sg/programmes/StayWell,Click here for the latest COVID-19 UPDATES [ht...,programs,StayWell


******************************************************************************************************************************************************
39. DataFrame: df_justcheckingin
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 28 to 28
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
28,1434632,Hi #JustCheckingIn,https://www.healthhub.sg/programmes/justchecki...,[#equip-yourself]\nEquip Yourself Before The C...,programs,justcheckingin


******************************************************************************************************************************************************
40. DataFrame: df_EmotionsExplorer
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 30 to 30
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
30,1434638,How are you feeling today?,https://www.healthhub.sg/programmes/EmotionsEx...,Click here [/programmes/MindSG/EmotionsExplore...,programs,EmotionsExplorer


******************************************************************************************************************************************************
41. DataFrame: df_preventive_health
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 31 to 31
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
31,1434654,Preventive Health,https://www.healthhub.sg/programmes/preventive...,- Screen For Life\n- FIGHT the Spread\n- Lets ...,programs,preventive-health


******************************************************************************************************************************************************
42. DataFrame: df_Fight_The_Spread
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 33 to 33
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
33,1434676,FIGHT The Spread Of Infectious Diseases,https://www.healthhub.sg/programmes/Fight_The_...,"- ABOUT F.I.G.H.T.\n- INFLUENZA\n- HAND, FOOT ...",programs,Fight_The_Spread


******************************************************************************************************************************************************
43. DataFrame: df_cervical_cancer_screening
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 34 to 34
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
34,1434604,Screen for Life - National Cervical Cancer Scr...,https://www.healthhub.sg/programmes/cervical-c...,What is cervical cancer and HPV?\n- What is ce...,programs,cervical-cancer-screening


******************************************************************************************************************************************************
44. DataFrame: df_breast_cancer_screening
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 35 to 35
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
35,1434634,Screen for Life - Breast Cancer Screening Prog...,https://www.healthhub.sg/programmes/breast-can...,- Am I at risk of breast cancer?\n- How do I p...,programs,breast-cancer-screening


******************************************************************************************************************************************************
45. DataFrame: df_healthy_pregNaNcy
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 36 to 36
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
36,1434571,Healthy Pregnancy,https://www.healthhub.sg/programmes/healthy-pr...,aTBC,programs,healthy-pregnancy


******************************************************************************************************************************************************
46. DataFrame: df_colorectal_cancer_screening
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 37 to 37
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
37,1434620,Screen for Life - National Colorectal Cancer S...,https://www.healthhub.sg/programmes/colorectal...,- What is colorectal cancer?\n- Screening Meth...,programs,colorectal-cancer-screening


******************************************************************************************************************************************************
47. DataFrame: df_personal_health_eservices
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 38 to 38
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
38,1434584,Get the latest updates on your family’s health,https://www.healthhub.sg/programmes/personal-h...,1. 2. \n3. 4. \n5. \nPrevious [#mastheadCar...,programs,personal-health-eservices


******************************************************************************************************************************************************
48. DataFrame: df_nsc_participating_outlets
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 39 to 39
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
39,1434670,Authorised Service Providers,https://www.healthhub.sg/programmes/nsc-partic...,1. \nPrevious [#myCarousel] Next [#myCarousel...,programs,nsc-participating-outlets


******************************************************************************************************************************************************
49. DataFrame: df_lose_to_win
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 40 to 40
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
40,1434668,Lose To Win,https://www.healthhub.sg/programmes/lose_to_win,In line with MOHs advisory on\n20 July 2021\no...,programs,lose_to_win


******************************************************************************************************************************************************
50. DataFrame: df_community_challenge
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 41 to 41
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
41,1434570,National Steps Challenge™ Community Challenge,https://www.healthhub.sg/programmes/community-...,The National Steps Challenge\nSeason 5 Communi...,programs,community-challenge


******************************************************************************************************************************************************
51. DataFrame: df_testestnew
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 42 to 42
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
42,1488542,testestnew,https://www.healthhub.sg/programmes/testestnew,test Eat Drink Shop Healthy Challenge Event Lo...,programs,testestnew


******************************************************************************************************************************************************
52. DataFrame: df_healthytimeout
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 43 to 43
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
43,1434602,The Healthy Time Out Challenge 1 May to 31 Aug...,https://www.healthhub.sg/programmes/healthytim...,Make quality time more rewarding than ever! Si...,programs,healthytimeout


******************************************************************************************************************************************************
53. DataFrame: df_go_hunkle
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 44 to 44
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
44,1434608,"Don’t Uncle, Be Hunkle!",https://www.healthhub.sg/programmes/go-hunkle,Download the Healthy 365 app to sign up for fr...,programs,go-hunkle


******************************************************************************************************************************************************
54. DataFrame: df_nsc_silver_challenge
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 45 to 45
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
45,1434626,National Steps Challenge™ Silver Challenge,https://www.healthhub.sg/programmes/nsc-silver...,In line with MOHs advisory on\n20 July 2021\no...,programs,nsc-silver-challenge


******************************************************************************************************************************************************
55. DataFrame: df_lower_calorie_meals_can_be_satisfying
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 46 to 46
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
46,1434576,Lower-calorie meals can be satisfying,https://www.healthhub.sg/programmes/lower-calo...,"Lower in calories, just as satisfying\njust as...",programs,lower-calorie-meals-can-be-satisfying


******************************************************************************************************************************************************
56. DataFrame: df_national_steps_challenge_campus
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 47 to 47
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
47,1434622,National Steps Challenge™ Campus Challenge,https://www.healthhub.sg/programmes/national-s...,The National Steps Challenge\nSeason 5 Campus ...,programs,national-steps-challenge-campus


******************************************************************************************************************************************************
57. DataFrame: df_CH_TestSync_20240313
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 49 to 49
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
49,1489198,CH-TestSync-20240313,https://www.healthhub.sg/programmes/CH-TestSyn...,CH-TestSync-20240313,programs,CH-TestSync-20240313


******************************************************************************************************************************************************
58. DataFrame: df_swtss_shredder_prototype
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 50 to 50
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
50,1434630,SWTSS Shredder Prototype,https://www.healthhub.sg/programmes/swtss-shre...,Click here for the latest COVID-19 UPDATES [ht...,programs,swtss-shredder-prototype


******************************************************************************************************************************************************
59. DataFrame: df_wholegrains_food_truck
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 53 to 53
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
53,1434582,Wholegrains Food Truck,https://www.healthhub.sg/programmes/wholegrain...,[/programmes/42/get-healthy#what-is-wholegrain...,programs,wholegrains-food-truck


******************************************************************************************************************************************************
60. DataFrame: df_tampines_healthy_pathway
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 54 to 54
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
54,1434674,Healthy Pathway & Healthy Community,https://www.healthhub.sg/programmes/tampines-h...,Every step you take gets you closer to great p...,programs,tampines-healthy-pathway


******************************************************************************************************************************************************
61. DataFrame: df_vaping
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 55 to 55
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
55,1434600,E-Cigarettes,https://www.healthhub.sg/programmes/vaping,What are E-cigarettes?\n\nInside an E-cigarett...,programs,vaping


******************************************************************************************************************************************************
62. DataFrame: df_1test_987backup
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 56 to 56
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
56,1468627,1test-2,https://www.healthhub.sg/programmes/1test-987b...,FS Schedule Jun-Aug 2024 as at 10 Jun 2024 [ht...,programs,1test-987backup


******************************************************************************************************************************************************
63. DataFrame: df_national_steps_challenge_youth
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 58 to 58
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
58,1434594,National Steps Challenge™ Youth Challenge,https://www.healthhub.sg/programmes/national-s...,The National Steps Challenge\nSeason 5 Youth C...,programs,national-steps-challenge-youth


******************************************************************************************************************************************************
64. DataFrame: df_PDG_TestA
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 59 to 59
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
59,1476359,PDG-TestA,https://www.healthhub.sg/programmes/PDG-TestA,Assets [https://ch-api.healthhub.sg/api/public...,programs,PDG-TestA


******************************************************************************************************************************************************
65. DataFrame: df_get_healthhub_track
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 314 to 314
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
314,1435311,FAQs,https://www.healthhub.sg/programmes/get-health...,HealthHub Track Login: Facebook and Email Logi...,program-sub-pages,get-healthhub-track


******************************************************************************************************************************************************
66. DataFrame: df_Copy_of_1test
<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 353 to 353
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      1 non-null      int64 
 1   title                   1 non-null      object
 2   full_url                1 non-null      object
 3   extracted_content_body  1 non-null      object
 4   content_category        1 non-null      object
 5   secondLvl               1 non-null      object
dtypes: int64(1), object(5)
memory usage: 56.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl
353,1470565,test article,https://www.healthhub.sg/programmes/Copy-of-1t...,Website List_As of 31 Oct 2023 [https://ch-api...,program-sub-pages,Copy-of-1test


******************************************************************************************************************************************************
67. DataFrame: df_NaN
<class 'pandas.core.frame.DataFrame'>
Index: 0 entries
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   id                      0 non-null      int64 
 1   title                   0 non-null      object
 2   full_url                0 non-null      object
 3   extracted_content_body  0 non-null      object
 4   content_category        0 non-null      object
 5   secondLvl               0 non-null      object
dtypes: int64(1), object(5)
memory usage: 0.0+ bytes


Unnamed: 0,id,title,full_url,extracted_content_body,content_category,secondLvl


******************************************************************************************************************************************************


#### Summary of each Group

Legend:
- *, #, ~: to be grouped together
- -: to be removed

1. parent-hub:              taking care of children
2. MindSG:                  mental well being
3. diabetes-hub:            *diabetes
4. diabetes-hub-v2:         *diabetes, url have 404 error, can combine with above group
5. pressure-injury:         pressure injury
6. nsc:                     national steps challenge
7. nutrition-hub:           nurition
8. LetsMoveIt:              #physical activity
9. AAP:                     live well, age well (elderly)
10. healthhub--parenting:    taking care of baby
11. IQuit:                   ~vaping
12. MoveIt:                  #physical activity
13. korangok:                malay community
14. indian_outreach:         indian community
15. Screen_for_Life:         screenings
16. healthia:                healthia clinic information (take note bcos info seems off)
17. 1test-Program:            - API test calls (remove)
18. 1test:                    - API test calls (remove)
19. 1test-221123:             - API test calls (remove)
20. healthhub-rewards:        hh rewards
21. get-healthhub-track:      hh track
22. howareyoudoing:           - menu list (remove)
23. Copy-of-1test:            ~parent child vaping
24. ga-testing                - gibbish content (remove)
25. 10-fun-ways-to-get-active - test content (remove)

Hence, total expected a final of 16 groups.

<hr>

### Data Preparation

Final Clean, Removal & Merge of Dfs

#### Initial Group List

In [38]:
print(len(grpList))
grpList

67


['df_parent_hub',
 'df_MindSG',
 'df_diabetes_hub',
 'df_diabetes_hub_v2',
 'df_nsc',
 'df_pressure_injury',
 'df_nutrition_hub',
 'df_LetsMoveIt',
 'df_AAP',
 'df_healthia',
 'df_healthhub__parenting',
 'df_IQuit',
 'df_korangok',
 'df_MoveIt',
 'df_Screen_for_Life',
 'df_ga_testing',
 'df_1test',
 'df_indian_outreach',
 'df_10_fun_ways_to_get_active',
 'df_healthhub_rewards',
 'df_howareyoudoing',
 'df_1test_221123',
 'df_1test_Program',
 'df_healthyliving',
 'df_eat_drink_shop_healthy_challenge',
 'df_hsg',
 'df_vaccinate',
 'df_school_dental_programme',
 'df_diabetes_mellitus',
 'df_hiv_prevention',
 'df_children_health_ehb',
 'df_sexually_transmitted_infections',
 'df_use_antibiotics_right',
 'df_healthy_workplace_ecosystem',
 'df_health_promoting_malls',
 'df_strokehub',
 'df_hygiene',
 'df_StayWell',
 'df_justcheckingin',
 'df_EmotionsExplorer',
 'df_preventive_health',
 'df_Fight_The_Spread',
 'df_cervical_cancer_screening',
 'df_breast_cancer_screening',
 'df_healthy_pregNaNcy

#### Removal of redundant groups

After looking through how the characteristics of each groups/second level, I will be removing selected groups that are empty/extracted_content_body is irrelevant.

In [79]:
# To check cotent of selected removal of groups
# df_Copy_of_1test['extracted_content_body'].iloc[0]

'Website List_As of 31 Oct 2023 [https://ch-api.healthhub.sg/api/public/content/209ba2981fb346898866c164540349c0] Synopsis: Learn proactive parenting strategies to discuss vaping with teenagers amid its rise in Singapore. Encourage open dialogue, offer support, identify root causes, dispel misconceptions, and provide access to professional help. Estimated read time: 4 mins\nWith the recent rise on youths vaping in Singapore, its important to broach this topic with your teenager early, even before they might encounter vaping or situations where their friends are vaping.\nBe Casual:\nYou can bring up the topic casually, like if you and your child see vaping content on social media, a vape report on the news, or someone vaping.\nBe Curious:\nAsk your child what they know or think about vaping. You might be surprised by how much they already know. Thank them for sharing their thoughts, fostering an open dialogue.\nBe Candid:\nThen, share your own feelings about vaping in a simple and open 

In [91]:
# Remove 17 groups
exclude = ['df_1test_Program', 'df_1test', 'df_1test_221123', 'df_howareyoudoing', 'df_healthia', 'df_indian_outreach', 'df_testestnew', 'df_10_fun_ways_to_get_active', 
           'df_EmotionsExplorer', 'df_healthy_pregNaNcy', 'df_CH_TestSync_20240313', 'df_ga_testing', 'df_korangok', 
           'df_swtss_shredder_prototype', 'df_wholegrains_food_truck', 'df_1test_987backup', 'df_PDG_TestA', 'df_NaN']
filtered_grpList = [df for df in grpList if df not in exclude]
print(len(filtered_grpList))
filtered_grpList

50


['df_parent_hub',
 'df_MindSG',
 'df_diabetes_hub',
 'df_diabetes_hub_v2',
 'df_nsc',
 'df_pressure_injury',
 'df_nutrition_hub',
 'df_LetsMoveIt',
 'df_AAP',
 'df_healthhub__parenting',
 'df_IQuit',
 'df_korangok',
 'df_MoveIt',
 'df_Screen_for_Life',
 'df_healthhub_rewards',
 'df_healthyliving',
 'df_eat_drink_shop_healthy_challenge',
 'df_hsg',
 'df_vaccinate',
 'df_school_dental_programme',
 'df_diabetes_mellitus',
 'df_hiv_prevention',
 'df_children_health_ehb',
 'df_sexually_transmitted_infections',
 'df_use_antibiotics_right',
 'df_healthy_workplace_ecosystem',
 'df_health_promoting_malls',
 'df_strokehub',
 'df_hygiene',
 'df_StayWell',
 'df_justcheckingin',
 'df_preventive_health',
 'df_Fight_The_Spread',
 'df_cervical_cancer_screening',
 'df_breast_cancer_screening',
 'df_colorectal_cancer_screening',
 'df_personal_health_eservices',
 'df_nsc_participating_outlets',
 'df_lose_to_win',
 'df_community_challenge',
 'df_healthytimeout',
 'df_go_hunkle',
 'df_nsc_silver_challenge'

#### Merge common groups together

I will also be combining groups/second levels that are similar in content together, in order to reduce the number of groups and make the groups themselves more condense.

In [94]:
print(df_preventive_health['extracted_content_body'].iloc[0])

- Screen For Life
- FIGHT the Spread
- Lets B.E.A.T. Diabetes
- Live Well, Age Well
- HIV

Breast Cancer Screening
Explore how you can prepare for a mammogram, what to expect, and post-screening information. Learn More [/programmes/61/Screen_for_Life/screening-journey/?age_group=50-and-above&gender=female#50-and-above-female-find_breast-cancer]
Cervical Cancer Screening
Discover your recommended test (Pap or HPV) and learn how to prepare for it. Learn More [/programmes/61/Screen_for_Life/screening-journey/?age_group=50-and-above&gender=female#50-and-above-female-find_cervical-cancer]
Chronic Disease Screening
Find out the diseases you can protect yourself against, the screening tests included, and post-screening details. Learn More [/programmes/61/Screen_for_Life/screening-journey/?age_group=50-and-above&gender=female#50-and-above-female-find_chronic-diseases]
Colorectal Cancer Screening
Learn more about the Faecal Immunochemical Test (FIT) kit and how to use it. Learn More [/programme

In [95]:
# * Run once only
df_parent_hub_concatenated = pd.concat([df_parent_hub, df_healthhub__parenting, df_school_dental_programme, df_healthytimeout, df_children_health_ehb], axis=0)
df_MindSG_concatenated = pd.concat([df_MindSG, df_justcheckingin, df_StayWell], axis=0)
df_diabetes_hub_concatenated = pd.concat([df_diabetes_hub, df_diabetes_hub_v2, df_diabetes_mellitus], axis=0)
df_LetsMoveIt_concatenated = pd.concat([df_LetsMoveIt, df_MoveIt, df_lose_to_win, df_go_hunkle], axis=0)
df_nsc_concatenated = pd.concat([df_nsc, df_national_steps_challenge_youth, df_tampines_healthy_pathway, df_national_steps_challenge_campus, df_nsc_silver_challenge, df_community_challenge, df_nsc_participating_outlets, df_health_promoting_malls, df_healthy_workplace_ecosystem], axis=0)
df_iquit_concatenated = pd.concat([df_IQuit, df_vaping, df_Copy_of_1test], axis=0)
df_AAP_concatenated = pd.concat([df_AAP, df_strokehub], axis=0)
df_screen_for_life_concatenated = pd.concat([df_Screen_for_Life, df_colorectal_cancer_screening, df_breast_cancer_screening, df_cervical_cancer_screening, df_vaccinate, df_hsg, df_personal_health_eservices], axis=0)
df_nutrition_hub_concatenated = pd.concat([df_nutrition_hub, df_eat_drink_shop_healthy_challenge, df_lower_calorie_meals_can_be_satisfying], axis=0)
df_sti_concatenated = pd.concat([df_hiv_prevention, df_sexually_transmitted_infections], axis=0)
df_healthy365_concatenated = pd.concat([df_healthhub_rewards, df_healthyliving, df_get_healthhub_track], axis=0)
df_preventive_health_concatenated = pd.concat([df_Fight_The_Spread, df_use_antibiotics_right, df_preventive_health, df_hygiene], axis=0)

# df_hiv_prevention, df_sexually_transmitted_infections

newGrps = ['df_parent_hub_concatenated', 'df_MindSG_concatenated', 'df_diabetes_hub_concatenated', 'df_LetsMoveIt_concatenated', 
           'df_nsc_concatenated', 'df_iquit_concatenated', 'df_AAP_concatenated', 'df_screen_for_life_concatenated', 
           'df_nutrition_hub_concatenated', 'df_sti_concatenated', 'df_healthy365_concatenated', 'df_preventive_health_concatenated']
for grp in newGrps:
    print(grp)
    filtered_grpList.append(grp)

exclude_mergedGrps = ['df_parent_hub', 'df_healthhub__parenting', 'df_school_dental_programme', 'df_healthytimeout', 'df_children_health_ehb', 'df_MindSG', 'df_justcheckingin', 'df_StayWell',
                      'df_diabetes_hub', 'df_diabetes_hub_v2', 'df_diabetes_mellitus', 'df_LetsMoveIt', 'df_MoveIt', 'df_lose_to_win', 'df_go_hunkle', 
                      'df_nsc', 'df_national_steps_challenge_youth', 'df_tampines_healthy_pathway', 'df_national_steps_challenge_campus', 'df_nsc_silver_challenge', 'df_community_challenge', 'df_nsc_participating_outlets', 'df_health_promoting_malls', 'df_healthy_workplace_ecosystem',
                      'df_IQuit', 'df_vaping', 'df_Copy_of_1test', 'df_AAP', 'df_strokehub', 'df_Screen_for_Life', 'df_colorectal_cancer_screening', 'df_breast_cancer_screening', 'df_cervical_cancer_screening', 'df_vaccinate', 'df_hsg', 'df_personal_health_eservices', 'df_preventive_health',
                      'df_nutrition_hub', 'df_eat_drink_shop_healthy_challenge', 'df_lower_calorie_meals_can_be_satisfying', 
                      'df_hiv_prevention', 'df_sexually_transmitted_infections', 'df_healthhub_rewards', 'df_healthyliving', 'df_get_healthhub_track', 'df_use_antibiotics_right', 'df_Fight_The_Spread', 'df_hygiene']

print('Number of groups to be combined', len(exclude_mergedGrps))

final_grpList = [df for df in filtered_grpList if df not in exclude_mergedGrps]
len(final_grpList)

df_parent_hub_concatenated
df_MindSG_concatenated
df_diabetes_hub_concatenated
df_LetsMoveIt_concatenated
df_nsc_concatenated
df_iquit_concatenated
df_AAP_concatenated
df_screen_for_life_concatenated
df_nutrition_hub_concatenated
df_sti_concatenated
df_healthy365_concatenated
df_preventive_health_concatenated


14

In [96]:
final_grpList

['df_pressure_injury',
 'df_korangok',
 'df_parent_hub_concatenated',
 'df_MindSG_concatenated',
 'df_diabetes_hub_concatenated',
 'df_LetsMoveIt_concatenated',
 'df_nsc_concatenated',
 'df_iquit_concatenated',
 'df_AAP_concatenated',
 'df_screen_for_life_concatenated',
 'df_nutrition_hub_concatenated',
 'df_sti_concatenated',
 'df_healthy365_concatenated',
 'df_preventive_health_concatenated']

#### Export to xlsx

In [98]:
# Sort DataFrame by their number of rows
sorted_final_grpList = sorted(final_grpList, key=lambda df_name: len(eval(df_name)), reverse=True)
print(sorted_final_grpList)

['df_parent_hub_concatenated', 'df_MindSG_concatenated', 'df_diabetes_hub_concatenated', 'df_nsc_concatenated', 'df_LetsMoveIt_concatenated', 'df_pressure_injury', 'df_nutrition_hub_concatenated', 'df_screen_for_life_concatenated', 'df_AAP_concatenated', 'df_iquit_concatenated', 'df_korangok', 'df_healthy365_concatenated', 'df_preventive_health_concatenated', 'df_sti_concatenated']


In [99]:
# keys = DataFrame names, values = DataFrame objects
dataframes = {name: globals()[name] for name in sorted_final_grpList}

# Save all DataFrames to Excel file
with pd.ExcelWriter(r"program-sub-pages\grouped_programSubpages.xlsx", engine='xlsxwriter') as writer:
    for df_name, df in dataframes.items():
        sheet_name = df_name.replace('df_', '')
        df.to_excel(writer, sheet_name=sheet_name, index=False)

print("DataFrames saved to 'grouped_programSubpages.xlsx'.")

DataFrames saved to 'grouped_programSubpages.xlsx'.


#### Markdown Summary of final dataframes:

Total number of groups: 14

- ✅ 'df_parent_hub_concatenated': parent + children + grandparents (mental well being, health, checkup)
    - 'df_children_health_ehb'
- ✅ 'df_MindSG': (mental well being)
    - 'df_justcheckingin'
    - 'df_healthytimeout'
- ✅ 'df_diabetes_hub_concatenated': diabetes
- ✅ 'df_nsc_concatenated': national steps challenge 
- ✅ 'df_pressure_injury', 
- ✅ 'df_LetsMoveIt_concatenated':
    - 'df_go_hunkle'
    - 'df_lose_to_win'
- ✅ 'df_nutrition_hub_concatenated',
    - 'df_lower_calorie_meals_can_be_satisfying'
- ✅ 'df_AAP': live well, age well (elderly)
    - 'df_strokehub'
- ✅ 'df_screen_for_life_concatenated': functional screening, vaccination
    - 'df_vaccinate'
    - 'df_personal_health_eservices'
- ✅ 'df_iquit_concatenated', 
- ✅ 'df_korangok', 
- ✅ 'df_healthy365' 
    - 'df_healthhub_rewards', 
    - 'df_get_healthhub_track'
    - 'df_healthyliving'
- ✅ 'df_sti':
    - 'df_hiv_prevention', 
    - 'df_sexually_transmitted_infections', 
- ✅ 'df_preventive_health':
    - 'df_antibiotics', 
    - 'df_hygiene', 
    - 'df_Fight_The_Spread',
    - 'df_preventive_health'

<hr>

### Important Note: Further Manual Cleaning

Further manual checking & cleaning was done on this excel file `grouped_programSubpages.xlsx`. It was found that some `extracted_content_body` did not make sense. As we will be performing topic/keyword modelling on this attribute, I have decided to remove such cases.

Removed rows under the respective sheets in `grouped_programSubpages.xlsx` are as follows:
1. diabetes_hub_concatenated: 
    - rows with column `id`: `1435297`, `1435302` & `1435299`
        - `extracted_content_body` is ‘aaa’ and ‘aa’ only