# Residential segregation
Calculate DeSO-zone-based segregation indices on ethnic and income groups.

Two indicators will be used for each spatial zone:

1) Evenness
We define the evenness measure for each zone $ i $, $ S_i = \frac{n}{2n-2}\sum_{q=1}^{n}\mid\tau_{qi}-\frac{1}{n}\mid $,
where $ n $ is the number of ethnic or income groups, e.g., income quantiles, and $ \tau_{qi} $ is the share of group $ q $ in the total population of $ i $. $ S_i $ ranges between 0 and 1. A place with $ S_i = 0 $ means zero segregation, while $ S_i = 1 $ means that a single group lives in zone $ i $.

Ref: Moro E, Calacci D, Dong X, Pentland A. Mobility patterns are associated with experienced income segregation in large US cities. Nat Commun. 2021;12(1):4633. doi:[10.1038/s41467-021-24899-8](www.doi.org/10.1038/s41467-021-24899-8)

2) Exposure
We define the isolation measure for each zone $ i $, $ II_{qi}= \frac{\tau_{qi}^2P_i}{P_q} $,
where $ q $ represents the minority ethnic group or the group with the lowest income, $ \tau_{qi} $ is the share of group $ q $ in the total population of $ i $, and $ P_i $ is the total population in zone $ i $ and $ P_q $ is the total population of group $ q $.

Ref: Silm S, Ahas R. The temporal variation of ethnic segregation in a city: Evidence from a mobile phone use dataset. Social Science Research. 2014;47:30-43. doi:[10.1016/j.ssresearch.2014.03.011](www.doi.org/10.1016/j.ssresearch.2014.03.011)

In [1]:
%load_ext autoreload
%autoreload 2
%cd D:\mobi-social-segregation-se

D:\mobi-social-segregation-se


In [2]:
# Load libs
import pandas as pd
import geopandas as gpd
import sqlalchemy
from tqdm import tqdm
from lib import preprocess as preprocess

In [3]:
# Data location
user = preprocess.keys_manager['database']['user']
password = preprocess.keys_manager['database']['password']
port = preprocess.keys_manager['database']['port']
db_name = preprocess.keys_manager['database']['name']
engine = sqlalchemy.create_engine(f'postgresql://{user}:{password}@localhost:{port}/{db_name}')

## 1 Extract socioeconomic variables
1) Ethnicity groups.
1-1) Foreign vs. Swedish background. Persons with a foreign background are defined as persons who were born abroad, or born in Denmark with two foreign-born parents. Persons with a Swedish background are defined as persons who were born in Sweden to two Swedish-born parents or one Swedish-born and one foreign-born parent.
1-2) Region of birth. Sweden, Europe except Sweden, and the rest of world incl. unknown. Europe except Sweden = The Nordic countries, EU countries and the rest of Europe including Russia and Turkey.

2) Income groups.
Net income is the sum of all taxable and tax-free income of a person minus tax and other negative transfers (eg., repaid student loan).

### 1.1 Income groups

In [4]:
df_income = pd.read_csv("dbs/DeSO/income_2019.csv")
df_income.head()

Unnamed: 0,region,q1,q2,q3,q4,net income population
0,0114A0010,21,21,21,37,605
1,0114C1010,15,17,25,44,1130
2,0114C1020,15,19,23,43,1125
3,0114C1030,17,20,24,39,1726
4,0114C1040,25,28,29,18,1789


### 1.2 Region of birth

In [5]:
df_rb = pd.read_csv("dbs/DeSO/region of birth_2019.csv")
df_rb.head()

Unnamed: 0,region,region of birth,count
0,0114A0010,Sweden,668
1,0114A0010,Europe except Sweden,98
2,0114A0010,Other,24
3,0114A0010,Total,790
4,0114C1010,Sweden,1293


### 1.3 Foreign/Swedish background

In [6]:
df_b = pd.read_csv("dbs/DeSO/background_2019.csv")
df_b.head()

Unnamed: 0,region,background,count
0,0114A0010,Swedish background,642
1,0114A0010,Foreign background,148
2,0114A0010,Total,790
3,0114C1010,Swedish background,1190
4,0114C1010,Foreign background,418


In [7]:
df_deso = df_b.loc[df_b.background == 'Total', ['region', 'count']].rename(columns={'count': 'population'})
df_deso.head()

Unnamed: 0,region,population
2,0114A0010,790
5,0114C1010,1608
8,0114C1020,1610
11,0114C1030,2365
14,0114C1040,2346


In [8]:
df_deso.population.sum()

10327589

## 2. Evenness

In [9]:
def evenness(row, n, var_list):
    suma = sum([abs(row[var] - 1/n) for var in var_list])
    s_i = n/(2*n - 2) * suma
    return s_i

### 2.1 Income

In [10]:
n = 4
inc_var_list = ['q1', 'q2', 'q3', 'q4']
for var in inc_var_list:
    df_income.loc[:, var] /= 100
df_income.loc[:, 'S'] = df_income.apply(lambda row: evenness(row, n=n, var_list=inc_var_list), axis=1)
df_income.head()

Unnamed: 0,region,q1,q2,q3,q4,net income population,S
0,0114A0010,0.21,0.21,0.21,0.37,605,0.16
1,0114C1010,0.15,0.17,0.25,0.44,1130,0.246667
2,0114C1020,0.15,0.19,0.23,0.43,1125,0.24
3,0114C1030,0.17,0.2,0.24,0.39,1726,0.186667
4,0114C1040,0.25,0.28,0.29,0.18,1789,0.093333


### 2.2 Region of birth

In [11]:
df_rb = df_rb.pivot(index='region', columns='region of birth', values='count').reset_index()
df_rb.head()

region of birth,region,Europe except Sweden,Other,Sweden,Total
0,0114A0010,98,24,668,790
1,0114C1010,163,152,1293,1608
2,0114C1020,135,155,1320,1610
3,0114C1030,189,222,1954,2365
4,0114C1040,399,483,1464,2346


In [12]:
rb_var_list = ['Europe except Sweden', 'Other', 'Sweden']
for var in rb_var_list:
    df_rb.loc[:, var] /= df_rb.loc[:, 'Total']
df_rb.head()

region of birth,region,Europe except Sweden,Other,Sweden,Total
0,0114A0010,0.124051,0.03038,0.84557,790
1,0114C1010,0.101368,0.094527,0.804104,1608
2,0114C1020,0.083851,0.096273,0.819876,1610
3,0114C1030,0.079915,0.093869,0.826216,2365
4,0114C1040,0.170077,0.205882,0.624041,2346


In [13]:
n = 3
df_rb.loc[:, 'S'] = df_rb.apply(lambda row: evenness(row, n=n, var_list=rb_var_list), axis=1)
df_rb.head()

region of birth,region,Europe except Sweden,Other,Sweden,Total,S
0,0114A0010,0.124051,0.03038,0.84557,790,0.768354
1,0114C1010,0.101368,0.094527,0.804104,1608,0.706157
2,0114C1020,0.083851,0.096273,0.819876,1610,0.729814
3,0114C1030,0.079915,0.093869,0.826216,2365,0.739323
4,0114C1040,0.170077,0.205882,0.624041,2346,0.436061


### 2.3 Foreign/Swedish background

In [14]:
df_b = df_b.pivot(index='region', columns='background', values='count').reset_index()
df_b.head()

background,region,Foreign background,Swedish background,Total
0,0114A0010,148,642,790
1,0114C1010,418,1190,1608
2,0114C1020,453,1157,1610
3,0114C1030,567,1798,2365
4,0114C1040,1178,1168,2346


In [15]:
b_var_list = ['Foreign background', 'Swedish background']
for var in b_var_list:
    df_b.loc[:, var] /= df_b.loc[:, 'Total']
df_b.head()

background,region,Foreign background,Swedish background,Total
0,0114A0010,0.187342,0.812658,790
1,0114C1010,0.25995,0.74005,1608
2,0114C1020,0.281366,0.718634,1610
3,0114C1030,0.239746,0.760254,2365
4,0114C1040,0.502131,0.497869,2346


In [16]:
n = 2
df_b.loc[:, 'S'] = df_b.apply(lambda row: evenness(row, n=n, var_list=b_var_list), axis=1)
df_b.head()

background,region,Foreign background,Swedish background,Total,S
0,0114A0010,0.187342,0.812658,790,0.625316
1,0114C1010,0.25995,0.74005,1608,0.4801
2,0114C1020,0.281366,0.718634,1610,0.437267
3,0114C1030,0.239746,0.760254,2365,0.520507
4,0114C1040,0.502131,0.497869,2346,0.004263


### 2.3 Put evenness measurement results together

In [19]:
df_list = []
for df, name in zip([df_income, df_rb, df_b], ['income', 'birth_region', 'background']):
    df.loc[:, 'var'] = name
    df_list.append(df.loc[:, ['region', 'S', 'var']])
df_evenness = pd.concat(df_list)
df_evenness.head()

Unnamed: 0,region,S,var
0,0114A0010,0.16,income
1,0114C1010,0.246667,income
2,0114C1020,0.24,income
3,0114C1030,0.186667,income
4,0114C1040,0.093333,income


## 3 Isolation

In [31]:
def isolation(row, var, reverse=False, total=None, total_var=None):
    if reverse:
        iso_i = (1 - row[var])**2 * row[total] / total_var
        # iso_i = (iso_i - 1 + row[var]) / row[var]
    else:
        iso_i = row[var]**2 * row[total] / total_var
        # iso_i = (iso_i - row[var]) / (1 - row[var])
    return iso_i

In [32]:
P_var = sum(df_income.loc[:, 'q1'] * df_income.loc[:, 'net income population'])
df_income.loc[:, 'iso'] = df_income.apply(lambda row: isolation(row, var='q1', reverse=False, total='net income population', total_var=P_var), axis=1)
df_income.head()

Unnamed: 0,region,q1,q2,q3,q4,net income population,S,var,iso
0,0114A0010,0.21,0.21,0.21,0.37,605,0.16,income,1.4e-05
1,0114C1010,0.15,0.17,0.25,0.44,1130,0.246667,income,1.3e-05
2,0114C1020,0.15,0.19,0.23,0.43,1125,0.24,income,1.3e-05
3,0114C1030,0.17,0.2,0.24,0.39,1726,0.186667,income,2.6e-05
4,0114C1040,0.25,0.28,0.29,0.18,1789,0.093333,income,5.7e-05


In [40]:
P_var = sum(df_b.loc[:, 'Foreign background'] * df_b.loc[:, 'Total'])
df_b.loc[:, 'iso'] = df_b.apply(lambda row: isolation(row, var='Foreign background', reverse=False, total='Total', total_var=P_var), axis=1)
df_b.head()

background,region,Foreign background,Swedish background,Total,S,var,iso
0,0114A0010,0.187342,0.812658,790,0.625316,background,1.1e-05
1,0114C1010,0.25995,0.74005,1608,0.4801,background,4.1e-05
2,0114C1020,0.281366,0.718634,1610,0.437267,background,4.8e-05
3,0114C1030,0.239746,0.760254,2365,0.520507,background,5.2e-05
4,0114C1040,0.502131,0.497869,2346,0.004263,background,0.000224


In [42]:
P_var = sum((1 - df_rb.loc[:, 'Sweden']) * df_rb.loc[:, 'Total'])
df_rb.loc[:, 'iso'] = df_rb.apply(lambda row: isolation(row, var='Sweden', reverse=True, total='Total', total_var=P_var), axis=1)
df_rb.head()

region of birth,region,Europe except Sweden,Other,Sweden,Total,S,var,iso
0,0114A0010,0.124051,0.03038,0.84557,790,0.768354,birth_region,9e-06
1,0114C1010,0.101368,0.094527,0.804104,1608,0.706157,birth_region,3.1e-05
2,0114C1020,0.083851,0.096273,0.819876,1610,0.729814,birth_region,2.6e-05
3,0114C1030,0.079915,0.093869,0.826216,2365,0.739323,birth_region,3.5e-05
4,0114C1040,0.170077,0.205882,0.624041,2346,0.436061,birth_region,0.000164


### 3.1 Put isoluation measurement results together

In [43]:
df_list = []
for df, name in zip([df_income, df_rb, df_b], ['income', 'birth_region', 'background']):
    df.loc[:, 'var'] = name
    df_list.append(df.loc[:, ['region', 'iso', 'var']])
df_isolation = pd.concat(df_list)
df_isolation.head()

Unnamed: 0,region,iso,var
0,0114A0010,1.4e-05,income
1,0114C1010,1.3e-05,income
2,0114C1020,1.3e-05,income
3,0114C1030,2.6e-05,income
4,0114C1040,5.7e-05,income


## 4. Summarize residential segregation

In [44]:
df_seg = pd.merge(df_evenness, df_isolation, on = ['region', 'var'])
df_seg.head()

Unnamed: 0,region,S,var,iso
0,0114A0010,0.16,income,1.4e-05
1,0114C1010,0.246667,income,1.3e-05
2,0114C1020,0.24,income,1.3e-05
3,0114C1030,0.186667,income,2.6e-05
4,0114C1040,0.093333,income,5.7e-05


In [57]:
# Save the data to database
df_seg.to_sql('resi_segregation', engine, schema='public', index=False, method='multi', if_exists='replace', chunksize=10000)

### 4.1 Save data for visualization

In [60]:
df_seg_pivot = df_seg.pivot(index='region', columns='var', values=['S', 'iso']).reset_index()
df_seg_pivot.columns = ['region', 'S_b', 'S_rb', 'S_inc', 'iso_b', 'iso_rb', 'iso_inc']
df_seg_pivot.head()

Unnamed: 0,region,S_b,S_rb,S_inc,iso_b,iso_rb,iso_inc
0,0114A0010,0.625316,0.768354,0.16,1.1e-05,9e-06,1.4e-05
1,0114C1010,0.4801,0.706157,0.246667,4.1e-05,3.1e-05,1.3e-05
2,0114C1020,0.437267,0.729814,0.24,4.8e-05,2.6e-05,1.3e-05
3,0114C1030,0.520507,0.739323,0.186667,5.2e-05,3.5e-05,2.6e-05
4,0114C1040,0.004263,0.436061,0.093333,0.000224,0.000164,5.7e-05


In [61]:
df_deso = gpd.GeoDataFrame.from_postgis(sql="""SELECT deso, befolkning, geom FROM zones;""", con=engine)
df_deso = pd.merge(df_deso.rename(columns={'deso': 'region'}), df_seg_pivot, on='region', how='outer')
df_deso.to_file('apps/interactive-residential-segregation-se/data/resi_segregation.shp')

In [62]:
df_deso.head()

Unnamed: 0,region,befolkning,geom,S_b,S_rb,S_inc,iso_b,iso_rb,iso_inc
0,0114A0010,790,"POLYGON ((661116.252 6606615.603, 661171.409 6...",0.625316,0.768354,0.16,1.1e-05,9e-06,1.4e-05
1,0114C1010,1608,"POLYGON ((666960.066 6598800.393, 666971.371 6...",0.4801,0.706157,0.246667,4.1e-05,3.1e-05,1.3e-05
2,0114C1020,1610,"POLYGON ((667034.814 6600076.634, 667032.984 6...",0.437267,0.729814,0.24,4.8e-05,2.6e-05,1.3e-05
3,0114C1120,2148,"POLYGON ((664638.646 6601646.294, 664580.192 6...",0.232775,0.556564,0.08,0.00012,9.3e-05,4.2e-05
4,0180C4390,1111,"POLYGON ((676903.235 6581075.211, 676413.795 6...",0.576958,0.723222,0.346667,1.9e-05,1.9e-05,2.2e-05
