# DATA PROJECT

## Part 1: Introduction

In this project we seek to examine the development in disposable income for the different deciles in Denmark in the periode from 2010 to 2021. We do this by using data from Statistikbanken, more precisely the IFOR31 table. 

Firstly we use the data to create a Lorenz Diagram (Plot 1). This is done by plotting a Lorenz Curve (the blue curve) and a 45-degree line (the orange line) in the same diagram. On the x-axis we have a share of the population, while we have the accumulative disposable income in percentage on the y-axis. This makes it easy to see how much of the total disposable income belongs to the poorest 20% of the population. 

The closer our Lorenz Curve is to the 45-degree line the more equal the distribution of income is. If there is perfect distribution of wealth the Lorenz Curve will lie on the 45 degree line. If there is perfect inequality (1 person has all the wealth) it will follow the outer axes. 

Secondly we make a diagram where it is possible to compare the average disposable income for each decile group in the periode 2010 to 2021. This makes it easy to compare seperate decile groups with each other and see the difference in development over the period.

Imports and set magics:

In [157]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
from matplotlib_venn import venn2
from scipy.stats import norm
%matplotlib inline

# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

# user written modules
import dataproject as dp

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Part 2: Read and clean data

[Data disclaimer](#heading-2)

When interpreting the 1st decile, it should be noted that individuals with significant losses, such as in stocks or from self-employment, can lower the income in the 1st decile. Particularly in the years following the financial crisis, extreme fluctuations occur. 

First we import the data from our IFOR31 table.

In [158]:
# 1. Import data
LorenzData = "Indkomst Data.xlsx" 

# 2. Shorten the call for the data
LData = pd.read_excel(LorenzData, skiprows=2) 

# 3. Removing first unnamed column
LData.drop("Unnamed: 0", axis=1, inplace=True) 

# 4. Renaming columns
LData.rename(columns = {"Unnamed: 1":"Decil"}, inplace=True)

# 5. Removing rows with missing values
LData = LData.dropna()

#6. Removing decimal points from all numbers using iloc to locate numbers
LData.iloc[:, 1:] = LData.iloc[:, 1:].astype(int) 

  LData.iloc[:, 1:] = LData.iloc[:, 1:].astype(int)


## Part 3: Preparing data

We start by calculating the accumulative income for each year

In [159]:
# 1. Creating a list of numbers ranging from 2010-2021
Years = np.linspace(2010,2021,num=12) 

# 2. Converting the list variables from float to integers
Years_int = Years.astype(int) 

# 3. Using for loop to calculate the cummulitative income for each year
for i in Years_int:
    Col_name = "Cum_" + str(i) #Looping over the names for the new columns
    LData[Col_name] = LData[str(i)].cumsum() #Calculating the cummulitative income

# 4. Showing data
LData    

Unnamed: 0,Decil,2010,2011,2012,2013,2014,2015,2016,2017,2018,...,Cum_2012,Cum_2013,Cum_2014,Cum_2015,Cum_2016,Cum_2017,Cum_2018,Cum_2019,Cum_2020,Cum_2021
0,1. decil,58517,61405,72143,75689,70459,77728,75853,78128,78721,...,72143,75689,70459,77728,75853,78128,78721,83792,88188,95473
1,2. decil,133314,135247,138299,140042,142148,143630,144500,146503,151039,...,210442,215731,212607,221358,220353,224631,229760,238777,248549,262444
2,3. decil,155258,157641,160792,163578,166795,169095,171292,174947,179714,...,371234,379309,379402,390453,391645,399578,409474,422830,438660,459379
3,4. decil,176861,179613,182871,185928,190010,192715,195643,200087,205721,...,554105,565237,569412,583168,587288,599665,615195,633538,657480,686762
4,5. decil,198711,202048,205633,209382,214412,217580,221437,226804,233247,...,759738,774619,783824,800748,808725,826469,848442,872334,906690,945798
5,6. decil,220835,224771,228906,233521,239499,243288,248126,254422,261607,...,988644,1008140,1023323,1044036,1056851,1080891,1110049,1140251,1187294,1237251
6,7. decil,245061,249659,254534,260358,267373,271974,277775,285080,293222,...,1243178,1268498,1290696,1316010,1334626,1365971,1403271,1440451,1502413,1564508
7,8. decil,274629,280171,285933,293589,301774,307454,314207,322842,331975,...,1529111,1562087,1592470,1623464,1648833,1688813,1735246,1780386,1859686,1936064
8,9. decil,318054,325162,332513,342856,352557,360069,368072,378649,388934,...,1861624,1904943,1945027,1983533,2016905,2067462,2124180,2179204,2279039,2373840
9,10. decil,500012,516944,533189,560971,573866,607665,615554,641188,648231,...,2394813,2465914,2518893,2591198,2632459,2708650,2772411,2867581,2999038,3153667


Now our data has been cleaned and our new values have been calculated. We new prepare to plot our data by putting it into lists.

In [160]:
# 1. Setting the number of lists we want
Num_Lists = 12

# 2. Naming our lists
List_Names = [f"Cum_Income_{i}" for i in Years_int]

# 3. Creating a list where we can store our 12 lists as sublists
Empty_Lists = [[] for i in range(Num_Lists)]

# 4. Creating a list for each year
for i, name in enumerate(List_Names):
    globals()[name] = Empty_Lists[i]

# 5. Adding a 0 to each list if it is empty
for i in Empty_Lists :
    if i == [] :
        i.append(0)
    else :
        0 # Else do nothing


In [161]:
# 1. Using nested loop to calculate percentage income and appending it to list
for l in Empty_Lists :    
    if i ==[0]:
        for i in range(13,25) :
            # a. Choosing the denominator in our dataset
            Denominator = LData.iloc[9,i] 

            for j in range(10):
                # i. Choosing the numerator in our dataset
                Numerator = LData.iloc[j,i] 

                # ii. Calculating the Value
                Value = (Numerator/Denominator)*100 

                # iii. Choosing which list to append the value to
                Num_Lists_Index = (i % Num_Lists) 

                # iv. Appending the value to the above choosen list
                globals()[List_Names[Num_Lists_Index]].append(Value) 
    else :
        0    # Else do nothing

#2. Making list with accumulative intervals for population
Cum_Num = list(range(0, 101, 10))        

# 3.Creating zips with the accumulative values
Zipped_Lists = [] 

# 4. Creating loop that zips each of our 12 lists with the list "Cum_Num" and appending them to a new list
for l in Empty_Lists:
    Zipped_List = list(zip(Cum_Num, l))
    Zipped_Lists.append(Zipped_List)

## Part 4: Creating Lorenz Diagram

Below we create our Lorenz Diagram and the slider for the diagram.

In [162]:
# 1. Making our widget slider
interactive_plot = widgets.interact(dp.L_Diagrams, # Choosing the diagram
                 Empty_Lists=widgets.fixed(Empty_Lists), # Setting what variables the slider needs to choose
                 
                 Year=widgets.IntSlider( # Setting the parameters and names for the widget
                     description="Year", 
                     # Min, max and value is set to the corresponding years of our data
                     min=2010, 
                     max=2021, 
                     step=1, 
                     value=2010, 
                     continuous_update=True, 
                     readout=True
                     ),
    Cum_Num=widgets.fixed(Cum_Num)               
)

interactive(children=(IntSlider(value=2010, description='Year', max=2021, min=2010), Output()), _dom_classes=(…

## Part 5: Comparing income groups 

In this part we create a diagram which shows the average disposable income for each decile over the choosen periode.

In [164]:
# 1. Creating a list for all of the income groups only
income_df = LData.iloc[:, 1:13] 

# 2. Making a list of the decils
decil_list = LData.Decil.tolist()  

# 3. Making our widget slider 
interactive_plot = widgets.interact(
    dp.Decile_Comp, #Choosing the diagram
    df=widgets.fixed(income_df),  # Setting what variables the slider needs to choose
    decils=widgets.SelectMultiple(
        options=decil_list, 
        value=["1. decil"]
    ),
    decil_list=widgets.fixed(decil_list),
    income_df=widgets.fixed(income_df)
)

interactive(children=(SelectMultiple(description='decils', index=(0,), options=('1. decil', '2. decil', '3. de…

## Part 6: Conclusion

From our Lorenz Diagram above showing the development in the inequality from the periode 2010 to 2021, we see that there has not been that much of an evolution. We see that in the periode 2010 to 2013 the top decile of the population got less disposable income and moved closer to the 45 degree line. 
In the following periode the inequality grew a small amount and they ended up almost at the same place as in 2010. This fall in inequality could also be seen by the 60% of the population moved closer to the 45 degree line. 

In the last section we have created a plot where you can choose different income groups and compare them (by holding CTRL while clicking on the drop-down menu). If we compare all of the groups the general trend from 2010 to 2021 is that "the rich gets richer". The higher decil income group you are in the steeper the curve. The plot therefore also shows a trend of higher inequaity through the years. Consindering the principle of compounded interest this overeall makes good sense.

Furthermore we see that the decile group 2-9 follow almost the same smooth upwards trending path, while decile group 1 and 10 are also upward going they are a lot more "bumpy". It seems that these two groups are influenced a lot more by the business cycles. 