# Sample code to access and plot data from Socialstyrelsens databases

### Anders Ledberg

Socialstyrelsen (National Board of Health and Welfare) has an API that we can use to fetch data from the databases they hold. See: https://sdb.socialstyrelsen.se/sdbapi.aspx (in Swedish)

First we import the libraries we need

In [None]:
import pandas as pd
import json
from matplotlib import pyplot as plt
import numpy as np

Next we access the age groups used (needed for the next example)

In [None]:
## preamble to replace the age indices with actual age range
url="http://sdb.socialstyrelsen.se/api/v1/sv/dodsorsaker/alder"
agevar=pd.read_json(url)
agevar=dict(zip(agevar['id'],agevar['text']))

## Example 1: Annual suicides

Here we will use data from the causes of death registry to investigate if the number of sucides per year has changed during the last 25 years. For more information about the registry see here: https://www.socialstyrelsen.se/statistik-och-data/register/dodsorsaksregistret/ (only in Swedish)

In [None]:
## for example, let's look deaths deemed to be suicides (code is 2026)
url="http://sdb.socialstyrelsen.se/api/v1/sv/dodsorsaker/resultat/matt/1/diagnos/2026/region/0/kon/1,2"
dat=pd.read_json(url)

In [None]:
## reshape the data into a pandas dataframe
df=pd.DataFrame(dat['data'].to_list())
## create a variable that has the right format
df['count']=df['varde'].apply(pd.to_numeric)

## change the values for sex
df.loc[df["konId"]==1,'konId']="men"
df.loc[df["konId"]==2,'konId']="women"
df.head(6)

Next we can plot the data, for men and women separately.

In [None]:
## sum over age groups
gdf=pd.DataFrame(df.groupby(['ar','konId'])['count'].sum())

## set size of figure text 
plt.rc('axes', titlesize=22) 
plt.rc('axes', labelsize=18) 
## make a dataframe suitable for plotting
dum = gdf.pivot_table(index='ar', columns='konId', values=['count'])
ax=dum.plot(marker="*",figsize=(12,8))
ax.set(xlabel="year", ylabel="number of cases")
ax.legend(['men','women'])
plt.title("Suicides, all ages")
plt.grid()
plt.show()


## Example 2: Age distribution of suicides

In this example we instead look at how suicides vary by age group. To do this we use number of cases per 1000 inhabitants as the measure.

In [None]:
## here we need to reload the data the another measure
url="http://sdb.socialstyrelsen.se/api/v1/sv/dodsorsaker/resultat/matt/2/diagnos/2026/region/0/kon/1,2"
## C-19
##url="http://sdb.socialstyrelsen.se/api/v1/sv/dodsorsaker/resultat/matt/2/diagnos/U07/region/0/kon/1,2"
dat=pd.read_json(url)
## X42
##url="http://sdb.socialstyrelsen.se/api/v1/sv/dodsorsaker/resultat/matt/2/diagnos/X42/region/0/kon/1,2"
##dat=pd.read_json(url)

## reshape the data into a pandas dataframe
df=pd.DataFrame(dat['data'].to_list())
df['varde']=df['varde'].str.replace(",",".")
## create a variable that has the right format
df['fraction']=df['varde'].apply(pd.to_numeric)/100

## change the values for sex
df.loc[df["konId"]==1,'konId']="men"
df.loc[df["konId"]==2,'konId']="women"

gdf=pd.DataFrame(df.groupby(['alderId','konId'])['fraction'].mean())
dum = gdf.pivot_table(index='alderId', columns='konId', values=['fraction'])
dum=dum.assign(age=[agevar[i] for i in dum.index])
dum.columns=dum.columns.map(','.join)
ax=dum.plot(x="age,",kind="bar",figsize=(12,8))

plt.rc('axes', titlesize=22) 
plt.rc('axes', labelsize=18) 

ax.set(xlabel="age group", ylabel="cases per 1000 people")
ax.legend(['men','women'])
plt.title("Suicides, average rates for 1997-2022")
plt.show()


## Example 3: Inpatient care occasions with F19 diagnosis

In this example we will istead use data from the Patient Registry  (https://www.socialstyrelsen.se/en/statistics-and-data/registers/national-patient-register/) and look at care occasions with a main diagnosis indicative of "Mental and behavioural disorders due to multiple drug use and use of other psychoactive substances" (see https://icd.who.int/browse10/2019/en#/F19)


In [None]:
## note how the URL is changing to use another database
url="http://sdb.socialstyrelsen.se/api/v1/sv/diagnoserislutenvard/resultat/matt/6/diagnos/F19/region/0/kon/1,2/alder/19"
##pd.set_option('display.max_rows', 100)
dat=pd.read_json(url)

In [None]:
df=pd.DataFrame(dat['data'].to_list())
df['antal']=df['varde'].apply(pd.to_numeric)
df.loc[df["konId"]==1,'konId']="men"
df.loc[df["konId"]==2,'konId']="women"
## sum over age groups
gdf=pd.DataFrame(df.groupby(['ar','konId'])['antal'].sum())

dum = gdf.pivot_table(index='ar', columns='konId', values=['antal'])
dum.columns=dum.columns.map(','.join)
ax=dum.plot(marker="*",figsize=(12,8))
ax.set(xlabel="year", ylabel="number of cases")
ax.legend(['men','women'])
plt.title("Inpatient care under F19 diagnosis")
plt.grid()
plt.show()
