<a href="https://colab.research.google.com/github/RubyNixx/Pop_Health_Streamlit/blob/main/POPULATION_HEALTH_BY_URBAN_EXTENTS.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# POPULATION HEALTH BY URBAN EXTENTS
Population health based on synthetic data, visualised using a streamlit python application.

You will need to upload 2 CSV files, provided in the GitHub Repo.

In [16]:
# Import python packages
!pip install streamlit pydeck h3 shapely

import pandas as pd
import h3
import pydeck as pdk
import streamlit as st
import json
from h3 import LatLngPoly, LatLngMultiPoly



## Curating the data

Firstly get the data - we are going to focus on the location, the Gender, the body weight and finally each morbidity.

In [13]:
from google.colab import files

uploaded = files.upload()
#UPLOAD 'DEFAULT_DATABASE.DEFAULT_SCHEMA.SYNTHETIC_POPULATION'


Saving URBAN_EXTENTS_FOR_CITIES_TOWNS_AND_VILLAGES__GREAT_BRITAIN_OPEN_BUILT_UP_AREAS.PRS_OPEN_BUILT_UP_AREAS_SCH.PRS_OPEN_BUILT_UP_AREAS_TBL.csv to URBAN_EXTENTS_FOR_CITIES_TOWNS_AND_VILLAGES__GREAT_BRITAIN_OPEN_BUILT_UP_AREAS.PRS_OPEN_BUILT_UP_AREAS_SCH.PRS_OPEN_BUILT_UP_AREAS_TBL.csv
Saving DEFAULT_DATABASE.DEFAULT_SCHEMA.SYNTHETIC_POPULATION.csv to DEFAULT_DATABASE.DEFAULT_SCHEMA.SYNTHETIC_POPULATION (1).csv


In [32]:
# Load Data
population_health = pd.read_csv('DEFAULT_DATABASE.DEFAULT_SCHEMA.SYNTHETIC_POPULATION.csv')
built_up_areas = pd.read_csv('URBAN_EXTENTS_FOR_CITIES_TOWNS_AND_VILLAGES__GREAT_BRITAIN_OPEN_BUILT_UP_AREAS.PRS_OPEN_BUILT_UP_AREAS_SCH.PRS_OPEN_BUILT_UP_AREAS_TBL.csv')


We will now bucket all the locations into **H3 indexes** which we did in the previoius excercise.  Next, a new table is created called **POPULATION_HEALTH_H3**.  

In [34]:
def get_h3(lat, lon, res=8):
    return h3.latlng_to_cell(lat, lon, res)

population_health['H3'] = population_health.apply(lambda row: get_h3(row['LAT'], row['LON'], 8), axis=1)


In [36]:
population_h3 = (
    population_health.groupby(['H3','BODY_WEIGHT','SEX'])
    .agg(
        TOTAL_POPULATION=('BODY_WEIGHT','count'),
        CANCER=('CANCER','sum'),
        DIABETES=('DIABETES','sum'),
        COPD=('COPD','sum'),
        ASTHMA=('ASTHMA','sum'),
        HYPERTENSION=('HYPERTENSION','sum'),
        LAT=('LAT','mean'),
        LON=('LON','mean')
    )
    .reset_index()
)

Now lets look at the built up urban areas.

In [38]:
# Display the first 5 rows
built_up_areas.head(5)

Unnamed: 0.1,Unnamed: 0,GSSCODE,NAME1_TEXT,NAME1_LANGUAGE,NAME2_TEXT,NAME2_LANGUAGE,AREAHECTARES,GEOMETRY_AREA_M,GEOMETRY,GEOGRAPHY
0,0,S45001606,Walkerburn,,,,30.0,300000,"MULTIPOLYGON (((335725 637225, 335725 637250, ...","{\n ""coordinates"": [\n [\n [\n ..."
1,1,E63008454,Walkeringham,,,,53.44,534375,"MULTIPOLYGON (((476850 392125, 476825 392125, ...","{\n ""coordinates"": [\n [\n [\n ..."
2,2,E63011204,Walkern,,,,52.31,523125,"MULTIPOLYGON (((528700 225575, 528700 225600, ...","{\n ""coordinates"": [\n [\n [\n ..."
3,3,E63007847,Walkington,,,,82.38,823750,"MULTIPOLYGON (((499525 436725, 499500 436725, ...","{\n ""coordinates"": [\n [\n [\n ..."
4,4,E63008456,Wallasey,,,,1659.13,16591250,"MULTIPOLYGON (((324750 389450, 324750 389425, ...","{\n ""coordinates"": [\n [\n [\n ..."


Next we need to join the two datasets together.  we need to join by the H3 index code.  Therefore, we need to split out all the towns by the same H3 index number.  You can use the function **H3_COVERAGE_STRINGS** for this

In [40]:
def parse_geojson(geojson_str):
    return json.loads(geojson_str)

def geojson_to_h3shape(geojson_obj):
    if geojson_obj['type'] == 'Polygon':
        outer = [[lat, lon] for lon, lat in geojson_obj['coordinates'][0]]
        holes = []
        if len(geojson_obj['coordinates']) > 1:
            for hole in geojson_obj['coordinates'][1:]:
                holes.append([[lat, lon] for lon, lat in hole])
        return LatLngPoly(outer, *holes)
    elif geojson_obj['type'] == 'MultiPolygon':
        polys = []
        for poly_coords in geojson_obj['coordinates']:
            outer = [[lat, lon] for lon, lat in poly_coords[0]]
            holes = []
            if len(poly_coords) > 1:
                for hole in poly_coords[1:]:
                    holes.append([[lat, lon] for lon, lat in hole])
            polys.append(LatLngPoly(outer, *holes))
        return LatLngMultiPoly(*polys)
    else:
        raise ValueError(f"Unsupported geometry type: {geojson_obj['type']}")

def h3shape_to_cells(h3shape, resolution=8):
    return list(h3.h3shape_to_cells(h3shape, resolution))

built_up_areas['geojson'] = built_up_areas['GEOGRAPHY'].apply(parse_geojson)
built_up_areas['h3shape'] = built_up_areas['geojson'].apply(geojson_to_h3shape)
built_up_areas['H3'] = built_up_areas['h3shape'].apply(h3shape_to_cells)
coverage = (
    built_up_areas[['GSSCODE', 'NAME1_TEXT', 'H3']]
    .explode('H3')
    .reset_index(drop=True)
)
print(coverage.head(10))


     GSSCODE    NAME1_TEXT               H3
0  S45001606    Walkerburn              NaN
1  E63008454  Walkeringham  881942a197fffff
2  E63011204       Walkern  88194e48d5fffff
3  E63007847    Walkington  88194058d3fffff
4  E63008456      Wallasey  8819510e0bfffff
5  E63008456      Wallasey  8819510e5dfffff
6  E63008456      Wallasey  8819510e03fffff
7  E63008456      Wallasey  8819510e1dfffff
8  E63008456      Wallasey  8819510e07fffff
9  E63008456      Wallasey  8819510e51fffff


You will note that there are multiple coverage strings per row.  We select the result which is returned as **VALUE** as well as all the original columns.  We will only select **GSSCODE** and **NAME1_TEXT**.  The Value Column is renamed as **H3**.

In [41]:
population_by_area = pd.merge(coverage, population_h3, on='H3', how='inner')

## Visualising the data


### Creating Filters

Firstly, let's create a filters to the dataset.  It would be good to filter by urban name, body weight and sex.  This is where we create streamlit components.  We will create three drop downlists.  I have nested it inside a container with a pre defined height and chose to create the drop down lists in 3 columns.

Each select box is populated with distinct values from the newly created dataframe.

In [42]:
import streamlit as st

st.title('POPULATION HEALTH BY URBAN AREA')

# Filters
col1, col2, col3 = st.columns(3)
with col1:
    urban_area = st.selectbox('Select Urban Area:', population_by_area['NAME1_TEXT'].unique())
with col2:
    body_weight = st.selectbox('Select Body Weight:', population_by_area['BODY_WEIGHT'].unique())
with col3:
    SEX = st.selectbox('Select Gender:', population_by_area['SEX'].unique())

df = population_by_area[
    (population_by_area['NAME1_TEXT'] == urban_area) &
    (population_by_area['BODY_WEIGHT'] == body_weight) &
    (population_by_area['SEX'] == SEX)
]

st.dataframe(df)

2025-05-26 21:30:53.512 
  command:

    streamlit run /usr/local/lib/python3.11/dist-packages/colab_kernel_launcher.py [ARGUMENTS]
2025-05-26 21:30:53.529 Session state does not function when running a script without `streamlit run`


DeltaGenerator()

### Create Metrics
Now we have fitered to the right areas, let's now summarize the results and create metrics.  Again, used the columns to lay out all the metrics accross the page

In [43]:

# Metrics
pop_metrics = df.agg({
    'TOTAL_POPULATION':'sum',
    'CANCER':'sum',
    'DIABETES':'sum',
    'COPD':'sum',
    'ASTHMA':'sum',
    'HYPERTENSION':'sum'
})

col1, col2, col3, col4, col5, col6 = st.columns(6)
col1.metric('Total Population', pop_metrics['TOTAL_POPULATION'])
col2.metric('Cancer Sufferers', pop_metrics['CANCER'])
col3.metric('Diabetics', pop_metrics['DIABETES'])
col4.metric('COPD Sufferers', pop_metrics['COPD'])
col5.metric('Asthmatics', pop_metrics['ASTHMA'])
col6.metric('Hypertension', pop_metrics['HYPERTENSION'])



DeltaGenerator()

### Creating a map

You will be creating a H3 map to visualise the hexagons which can filter by each extent area.  NB the color is in RGB format, you will need to manipulate the RGB based on the fields in the data.  This example is using the total population field.  Further calculations might need to be considered to have further control of how the colours are presented.

In [44]:
# Map
if not df.empty:
    LAT = df['LAT'].mean()
    LON = df['LON'].mean()
    layer = pdk.Layer(
        "H3HexagonLayer",
        df,
        pickable=True,
        stroked=True,
        filled=True,
        extruded=False,
        get_hexagon="H3",
        get_fill_color="[255 - TOTAL_POPULATION, 255-TOTAL_POPULATION, 255]",
        get_line_color=[1, 1, 1],
        line_width_min_pixels=1,
    )
    view_state = pdk.ViewState(latitude=LAT, longitude=LON, zoom=12, bearing=0, pitch=0)
    r = pdk.Deck(
        map_style=None,
        layers=[layer],
        initial_view_state=view_state,
        tooltip={
            "html": "Total Population: {TOTAL_POPULATION}<br>"
                    "Total Cancer: {CANCER}<br>"
                    "Total Diabetes: {DIABETES}<br>"
                    "Total COPD: {COPD}<br>"
                    "Total Hypertension: {HYPERTENSION}"
        }
    )
    st.pydeck_chart(r)



# Switching to Streamlit to view the dashboard

##Option 1: You can use pyngrok to tunnel the Streamlit app from Colab to the web.
This is not for production, but good for demos.

After running, click the printed URL to view your app.

In [47]:
!pip install streamlit pyngrok

import os
from pyngrok import ngrok

# Write your Streamlit script to a file
with open('app.py', 'w') as f:
    f.write("""
import streamlit as st
st.title("Hello Streamlit from Colab!")
""")  # Replace with your actual script content

# Start Streamlit in the background
os.system('streamlit run app.py &')

# Get a public URL via ngrok
public_url = ngrok.connect(port='8501')
print('Streamlit app available at:', public_url)




ERROR:pyngrok.process.ngrok:t=2025-05-26T21:38:42+0000 lvl=eror msg="failed to reconnect session" obj=tunnels.session err="authentication failed: Usage of ngrok requires a verified account and authtoken.\n\nSign up for an account: https://dashboard.ngrok.com/signup\nInstall your authtoken: https://dashboard.ngrok.com/get-started/your-authtoken\r\n\r\nERR_NGROK_4018\r\n"
ERROR:pyngrok.process.ngrok:t=2025-05-26T21:38:42+0000 lvl=eror msg="session closing" obj=tunnels.session err="authentication failed: Usage of ngrok requires a verified account and authtoken.\n\nSign up for an account: https://dashboard.ngrok.com/signup\nInstall your authtoken: https://dashboard.ngrok.com/get-started/your-authtoken\r\n\r\nERR_NGROK_4018\r\n"
ERROR:pyngrok.process.ngrok:t=2025-05-26T21:38:42+0000 lvl=eror msg="terminating with error" obj=app err="authentication failed: Usage of ngrok requires a verified account and authtoken.\n\nSign up for an account: https://dashboard.ngrok.com/signup\nInstall your aut

PyngrokNgrokError: The ngrok process errored on start: authentication failed: Usage of ngrok requires a verified account and authtoken.\n\nSign up for an account: https://dashboard.ngrok.com/signup\nInstall your authtoken: https://dashboard.ngrok.com/get-started/your-authtoken\r\n\r\nERR_NGROK_4018\r\n.

## Option 2 - If You’re On Your Own Computer

Download your app script (e.g., app.py) to your computer.

Open a terminal/command prompt on your computer.

Navigate to the folder with app.py.


Run:

```
#bash
streamlit run app.py
```

Visit http://localhost:8501 in your browser.