# YOUR PROJECT TITLE

> **Note the following:** 
> 1. This is *not* meant to be an example of an actual **data analysis project**, just an example of how to structure such a project.
> 1. Remember the general advice on structuring and commenting your code
> 1. The `dataproject.py` file includes a function which can be used multiple times in this notebook.

Imports and set magics:

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
from matplotlib_venn import venn2

# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

# user written modules
import dataproject


# Read and clean data

Import your data, either through an API or manually, and load it. 

First we import the data to

In [2]:
LorenzData = "Indkomst Data.xlsx" #First we import the data
LData = pd.read_excel(LorenzData, skiprows=2) #Shorten the call for the data
LData.drop("Unnamed: 0", axis=1, inplace=True) #Removing first unnamed column
LData.rename(columns = {"Unnamed: 1":"Decil"}, inplace=True) #Renaming first column

LData = LData.dropna() #Removes missing values from bottom rows
LData.iloc[:, 1:] = LData.iloc[:, 1:].astype(int) #Removing decimal points from all numbers using iloc to locate numbers

LData


  LData.iloc[:, 1:] = LData.iloc[:, 1:].astype(int) #Removing decimal points from all numbers using iloc to locate numbers


Unnamed: 0,Decil,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021
0,1. decil,58517,61405,72143,75689,70459,77728,75853,78128,78721,83792,88188,95473
1,2. decil,133314,135247,138299,140042,142148,143630,144500,146503,151039,154985,160361,166971
2,3. decil,155258,157641,160792,163578,166795,169095,171292,174947,179714,184053,190111,196935
3,4. decil,176861,179613,182871,185928,190010,192715,195643,200087,205721,210708,218820,227383
4,5. decil,198711,202048,205633,209382,214412,217580,221437,226804,233247,238796,249210,259036
5,6. decil,220835,224771,228906,233521,239499,243288,248126,254422,261607,267917,280604,291453
6,7. decil,245061,249659,254534,260358,267373,271974,277775,285080,293222,300200,315119,327257
7,8. decil,274629,280171,285933,293589,301774,307454,314207,322842,331975,339935,357273,371556
8,9. decil,318054,325162,332513,342856,352557,360069,368072,378649,388934,398818,419353,437776
9,10. decil,500012,516944,533189,560971,573866,607665,615554,641188,648231,688377,719999,779827


## Explore each data set

In [3]:
Years = np.linspace(2010,2021,num=12) #Creating a list of numbers ranging from 2010-2021
Years_int = Years.astype(int) #Converting the list variables from float to integers
print(Years)

#Using for loop to calculate the cummulitative income for each year
for i in Years_int:
    Col_name = "Cum_" + str(i) #Looping over the names for the new columns
    LData[Col_name] = LData[str(i)].cumsum() #Calculating the cummulitative income

LData    
    

[2010. 2011. 2012. 2013. 2014. 2015. 2016. 2017. 2018. 2019. 2020. 2021.]


Unnamed: 0,Decil,2010,2011,2012,2013,2014,2015,2016,2017,2018,...,Cum_2012,Cum_2013,Cum_2014,Cum_2015,Cum_2016,Cum_2017,Cum_2018,Cum_2019,Cum_2020,Cum_2021
0,1. decil,58517,61405,72143,75689,70459,77728,75853,78128,78721,...,72143,75689,70459,77728,75853,78128,78721,83792,88188,95473
1,2. decil,133314,135247,138299,140042,142148,143630,144500,146503,151039,...,210442,215731,212607,221358,220353,224631,229760,238777,248549,262444
2,3. decil,155258,157641,160792,163578,166795,169095,171292,174947,179714,...,371234,379309,379402,390453,391645,399578,409474,422830,438660,459379
3,4. decil,176861,179613,182871,185928,190010,192715,195643,200087,205721,...,554105,565237,569412,583168,587288,599665,615195,633538,657480,686762
4,5. decil,198711,202048,205633,209382,214412,217580,221437,226804,233247,...,759738,774619,783824,800748,808725,826469,848442,872334,906690,945798
5,6. decil,220835,224771,228906,233521,239499,243288,248126,254422,261607,...,988644,1008140,1023323,1044036,1056851,1080891,1110049,1140251,1187294,1237251
6,7. decil,245061,249659,254534,260358,267373,271974,277775,285080,293222,...,1243178,1268498,1290696,1316010,1334626,1365971,1403271,1440451,1502413,1564508
7,8. decil,274629,280171,285933,293589,301774,307454,314207,322842,331975,...,1529111,1562087,1592470,1623464,1648833,1688813,1735246,1780386,1859686,1936064
8,9. decil,318054,325162,332513,342856,352557,360069,368072,378649,388934,...,1861624,1904943,1945027,1983533,2016905,2067462,2124180,2179204,2279039,2373840
9,10. decil,500012,516944,533189,560971,573866,607665,615554,641188,648231,...,2394813,2465914,2518893,2591198,2632459,2708650,2772411,2867581,2999038,3153667


Making list with accumulative income 

In [4]:
Lorenz = LData.iloc[9,24] 
print(Lorenz)

3153667


In [5]:

Cum_Income_20 = []

Zero = 0
Cum_Income_20.append(Zero)

for i in range(10):
    Numerator = LData.iloc[i,13]
    Denominator = LData.iloc[9,13]
    Value = (Numerator/Denominator)*100
    Cum_Income_20.append(Value)

print(Cum_Income_20)

[0, 2.5651265182452443, 8.409022764692372, 15.214846934928714, 22.96765109685383, 31.678262638235495, 41.358692507447664, 52.1010830894614, 64.13960404199098, 78.08168496948167, 100.0]


Generating 12 empty lists for the accumulative income for each year

In [6]:
Num_Lists = 12

List_Names = [f"Cum_Income_{i}" for i in Years_int]


empty_lists = [[] for i in range(Num_Lists)]

for i, name in enumerate(List_Names):
    globals()[name] = empty_lists[i]

print(empty_lists[5])


#Adding a 0 to each list if it is empty
for i in empty_lists :
    if i == [] :
        i.append(0)
    else :
        0
print(Cum_Income_2013)


[]


Calculating the percatage cummulitative income and putting them into the same list

In [8]:
Cum_Income_List = []    

for i in range(13,25) :
    Denominator = LData.iloc[9,i]
    for j in range(10):
        Numerator = LData.iloc[j,i]
        Value = (Numerator/Denominator)*100
        Cum_Income_List.append(Value)
        
print(Cum_Income_List)

[2.5651265182452443, 8.409022764692372, 15.214846934928714, 22.96765109685383, 31.678262638235495, 41.358692507447664, 52.1010830894614, 64.13960404199098, 78.08168496948167, 100.0, 2.6324013647932554, 8.430372008620198, 15.188362132345848, 22.888280808913084, 31.549976614690262, 41.18579596435144, 51.888551315429034, 63.89934071003031, 77.83887157199439, 100.0, 3.0124690320288057, 8.787408453186115, 15.501586136370566, 23.137714719270356, 31.724314174008576, 41.28272228353529, 51.91127657984151, 63.850956212447485, 77.73567288969953, 100.0, 3.0694095576731386, 8.748520832437789, 15.382085506631618, 22.922007823468295, 31.413058200732063, 40.883015384964764, 51.4412911399181, 63.347180801925774, 77.25099091046971, 100.0, 2.797220842647941, 8.440493502502886, 15.062251552566941, 22.60564462245915, 31.11779658762798, 40.62590193390509, 51.240604503645045, 63.22102606184542, 77.217531669666, 100.0, 2.9996935780283867, 8.542689520445755, 15.068435526733195, 22.505729010287904, 30.902617244

Slicing the list and adding them to our seperate lists

In [9]:
Cum_Len = len(Cum_Income_List) 
print(Cum_Len)
Elements_per_list = Cum_Len // 11
sliced_lists = [Cum_Income_List[i*Elements_per_list:(i+1)*Elements_per_list] for i in range(12)]
#print(sliced_lists)

#for i, lst in enumerate(sliced_lists):
    #print(f"Cum_Values_{i+1} {lst}")

120
[0, 2.5651265182452443, 8.409022764692372, 15.214846934928714, 22.96765109685383, 31.678262638235495, 41.358692507447664, 52.1010830894614, 64.13960404199098, 78.08168496948167, 100.0]


Making list with accumulative intervals for population

In [10]:
Total = 0

Cum_Num = []

Cum_Num.append(Total)

for i in range(10):
    Total += 10
    Cum_Num.append(Total)

print(Cum_Num)


[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]


In [None]:
Kombiner de forskellige lister 
Og sæt i dataframe 
Plot lortet

Der skal tilføjes en række af nuller i vores datasæt som erstatning for "0.decil"
Kummulitativ deciler skal beregnes 
Procentvis kummulitativ indkomst skal beregnes
Plot Lorenz 

In order to be able to **explore the raw data**, you may provide **static** and **interactive plots** to show important developments 

Explain what you see when moving elements of the interactive plot around. 

# Merge data sets

Now you create combinations of your loaded data sets. Remember the illustration of a (inner) **merge**:

Here we are dropping elements from both data set X and data set Y. A left join would keep all observations in data X intact and subset only from Y. 

Make sure that your resulting data sets have the correct number of rows and columns. That is, be clear about which observations are thrown away. 

**Note:** Don't make Venn diagrams in your own data project. It is just for exposition. 

# Analysis

To get a quick overview of the data, we show some **summary statistics** on a meaningful aggregation. 

MAKE FURTHER ANALYSIS. EXPLAIN THE CODE BRIEFLY AND SUMMARIZE THE RESULTS.

# Conclusion

ADD CONCISE CONLUSION.