<h3>Imports</h3>

In [1]:
import duckdb
import pandas as pd
import numpy as np
import random

<h3>Reading yearly economic status data</h3>

In [3]:
df = pd.read_csv('../../data/additionalData/economicStatusMod2.csv')
df

Unnamed: 0,District,Total: All usual residents aged 16 years and over,Economically active (excluding full-time students):In employment,Economically active (excluding full-time students): Unemployed,Economically active and a full-time student:In employment,Economically active and a full-time student: Unemployed,Economically inactive
0,Boston,57584,32943,1411,688,218,22324
1,East Lindsey,121420,54579,3241,1000,268,62332
2,Lincoln,87064,43552,2299,3163,1576,36474
3,North Kesteven,98017,55685,1728,1209,356,39039
4,South Holland,78998,44457,1758,777,217,31789
5,South Kesteven,118011,66909,2598,1286,392,46826
6,West Lindsey,79120,41326,1754,812,267,34961


<h3>Column modifications</h3>
We create the columns:
<ul>
    <li>"Employed" which contain columns:</li>
    <ul>
        <li>Economically active (excluding full-time students):In employment</li>
        <li>Economically active and a full-time student:In employment</li>
    </ul>
    <li>"Unemployed" which contain columns:</li>
    <ul>
        <li>Economically active (excluding full-time students):Unemployed</li>
        <li>Economically active and a full-time student:Unemployed</li>
    </ul>
</ul>

We kept the Economically Inactive column as it is.

In [4]:
cols = [s for s in df.columns.to_list() if 'In employment' in s]

df['Employed'] = df[cols[0]] + df[cols[1]]

cols = [s for s in df.columns.to_list() if 'Unemployed' in s]

df['Unemployed'] = df[cols[0]] + df[cols[1]]

<h3>Keeping relevant columns</h3>

In [5]:
df = df[['District', 'Economically inactive', 'Employed', 'Unemployed']]


<h3>Retrieving months in crimesPrices table</h3>

In [6]:
con = duckdb.connect("../../data/exploitation/crimesPrices.db")
months = con.execute("SELECT Distinct Month FROM crimesPrices").df()
con.close()

In [6]:
df

Unnamed: 0,District,Economically inactive,Employed,Unemployed
0,Boston,22324,33631,1629
1,East Lindsey,62332,55579,3509
2,Lincoln,36474,46715,3875
3,North Kesteven,39039,56894,2084
4,South Holland,31789,45234,1975
5,South Kesteven,46826,68195,2990
6,West Lindsey,34961,42138,2021


<h3>Generating data</h3>
For each district we generate n values per district for Economically inactive, Employed and Unemployed where n is the number of different months in the crimesPrices table.

When generating the n values for a district, we take the original value and apply a random error between ±5%.

For example, say we want to calculate the n values for Employment in the district Boston. First, we would take the original value for Employment in Boston, 33631.

Next, for the n months, we apply a random error between ±5%. So, let's say that for the month 2022-05 we get a random value of 1.02 = 1 + 0.02 (the random error). The calculated value would be 33631*1.02 = 34303.62. Finally, as we cannot have 0.62 people, we floor the number. So the final value would be 34303.

We do this for all districts and each of the columns Economically inactive, Employed and Unemployed.

In [7]:
finaldf = pd.DataFrame(columns= ['Month'] + list(df.columns))
for row in df.values:
    for month in months.values:
        finaldf = finaldf.append(pd.DataFrame([list(month) + list(row)], columns=['Month'] + list(df.columns)), ignore_index=True)

for att in finaldf.columns[2:]:
    finaldf[att] = (finaldf[att] * pd.DataFrame(np.random.uniform(0.95, 1.05, size=(len(finaldf.values), 1)))[0]).apply(np.floor)

finaldf

Unnamed: 0,Month,District,Economically inactive,Employed,Unemployed
0,2022-05,Boston,22359,34826,1555
1,2022-04,Boston,22295,33244,1579
2,2022-03,Boston,23062,33775,1558
3,2022-02,Boston,21774,34109,1643
4,2022-01,Boston,21467,34849,1665
...,...,...,...,...,...
79,2021-10,West Lindsey,33279,43686,2007
80,2021-09,West Lindsey,34555,43732,2081
81,2021-08,West Lindsey,36189,41255,2033
82,2021-07,West Lindsey,36393,41119,1961


In [8]:
finaldf.to_csv('../../data/landing/temporal/economicStatus.csv', index=False)