# <u> **Correlation analysis - Economical perfomance vs. Organized Crime** </u>

## <u> Table of contents </u>
1. Data Sources
2. Data Preparation
   1. Libraries
   2. Reading Files
   3. Matching Country Names Completely 
   4. Matching Country Names Partially 
   5. Replacing Country Names
   6. Separating South and North America
   7. Rounding GDP Value
   8. Merging Dataframes
3. Data Visualization
   1. GDP
      1. Countries with Highest and Lowest Economic Perfomance
      2. Continental Economic Comparison
      3. Global Economic Comaparison
   2. Organized Crime Index
      1. Countries with Highest and Lowest Organized Crime
      2. Global Crime Comparison
   3. Correlation
      1. Relationship between Economic Performance and Crime
      2. Correlation Values
4. Conclusion

## 1. Data Sources

1. The Organized Crime Index (https://ocindex.net)
   * Evaluates organized crime by countries
   * Index in ranging 1 (lowest crime) to 10 (highest crime)
   * Implemented by the Institute of Security Studies and INTERPOL

2. GDP per Capita in US Dollar (https://unstats.un.org/unsd/snaama/Basic)
   * Gross domestic product (GDP) used as a measure of economic perfomance per country
   * Per Capita and in US Dollar for a better comparison between countries
   * United Nations (UN) as the data source

**The data from<span style="color:red"> 2021 </span> will be analyzed below**


## 2. Data Preparation

* Problems:
  * Some countries are not named consistently
  * Treating America as one large continent

### 2.1 Libraries

In [1]:
# importing necessary libraries
import pandas as pd # data analysis
import plotly.express as px # data visualization
import numpy as np # mathematic evaluations

### 2.2 Reading Files

In [2]:
# Saving files as variables
gdp_per_capita_path = "data/gdp_per_capita.xlsx"
organized_crime_path = "data/organized_crime.xlsx"
country_code_path = "data/country_codes.xlsx"

# Reading data from xlsx files
gdp_data = pd.read_excel(gdp_per_capita_path)
crime_data = pd.read_excel(organized_crime_path, sheet_name="2021_dataset") # sheet_name for table selecting
codes_data = pd.read_excel(country_code_path, skiprows=2) # skiping two rows

### 2.3 Country names that do not completly match

In [3]:
# not matching names from gdp with crime
gdp_countries_without_match = set(gdp_data["Country/Area"]).difference(set(crime_data.Country))
# not matching names from crime with gdp
crime_countries_without_gdp_match = set(crime_data.Country).difference(set(gdp_data["Country/Area"]))
# not matching names from codes with crime
code_countries_without_match = set(codes_data.Definition).difference(set(crime_data.Country))
# not matching names from crime with codes
crime_countries_without_code_match = set(crime_data.Country).difference(set(codes_data.Definition))

### 2.4 Country naems that do match partially

In [5]:
# tuples with replacing and to-be-replaced country names
replacing_gdp_countries = []

# which country names from crime are in gdp? (e.g. Eswatini in Kingdowm of Eswatini)
for crime_country in crime_countries_without_gdp_match:
    for gdp_country in gdp_countries_without_match:
        if crime_country in gdp_country:
            tuple = (crime_country, gdp_country)
            replacing_gdp_countries.append(tuple)

replacing_code_countries = []
# which country names from crime are in codes?
for crime_country in crime_countries_without_code_match:
    for code_country in code_countries_without_match:
        if crime_country in code_country:
            tupel = (crime_country, code_country)
            replacing_code_countries.append(tupel)

### 2.5 Replacing Names

In [6]:
# manual created lists for matching country names from code to crime
# tuple(replacing, to-be-replaced)
missing_code_countries = [
     ("Cabo Verde", "Cape Verde"),
     ("Congo, Rep.", "Congo"),
     ("Congo, Dem. Rep.", "Congo, The Democratic Republic of "),
     ("Czech Republic", "Czechia"),
     ("Côte d'Ivoire", "Cote d'Ivoire"),
     ("Eswatini", "Swaziland"),
     ("Kazakhstan", "Kazakstan"),
     ("Korea, DPR", "Korea, Democratic People's Republic of"),
     ("Korea, Rep.", "Korea, Republic of"),
     ("Laos", "Lao, People's Democratic Republic"),
     ("Micronesia (Federated States of)", "Micronesia, Federated States of"),
     ("North Macedonia", "Macedonia, The Former Yugoslav Republic Of"),
     ("St. Kitts and Nevis", "Saint Kitts & Nevis"),
     ("St. Lucia", "Saint Lucia"),
     ("St. Vincent and the Grenadines", "Saint Vincent and the Grenadines")
]
# manual created lists for matching country names from gdp to crime
missing_gdp_countries = [
    ("Turkey", "Türkiye"),
    ("Korea, DPR", "Democratic People's Republic of Korea"),
    ("Vietnam", "Viet Nam"),
    ("Congo, Rep.", "Congo"),
    ("Korea, Rep.", "Republic of Korea"),
    ("Congo, Dem. Rep.", "Democratic Republic of the Congo"),
    ("St. Kitts and Nevis", "Saint Kitts and Nevis"),
    ("St. Vincent and the Grenadines", "Saint Vincent and the Grenadines"),
    ("St. Lucia", "Saint Lucia"),
    ("Czech Republic", "Czechia"),
    ("Laos", "Lao People's Democratic Republic")
    ]

# merging partially and not completely matching countries
replacement_code_countries = missing_code_countries + replacing_code_countries
replacement_gdp_countries = missing_gdp_countries + replacing_gdp_countries


# replacing country names from codes with the ones from crime
for replacement in replacement_code_countries:
    new, old = replacement
    codes_data.loc[codes_data["Definition"] == old, "Definition"] = new

# replacing country names from gdp with the ones from crime
for replacement in replacement_gdp_countries:
    new, old = replacement
    gdp_data.loc[gdp_data['Country/Area'] == old, 'Country/Area'] = new

### 2.6 Sperating South and North America

In [7]:
# determination of the indices of the countries with "South America" as a region
change_continent = crime_data[crime_data.Region == 'South America'].index
# all countries with the region "South America" should also have "South America" as coninent
crime_data.loc[change_continent, 'Continent'] = "South America"
# remaining countries with continent "Americas" become "North America"
crime_data.loc[crime_data.Continent == "Americas", "Continent"] = "North America"

### 2.7 Rounding GDP Value

In [8]:
# rounding gdp to two decimal places 
gdp_data["GDP, Per Capita GDP - US Dollars"] = gdp_data["GDP, Per Capita GDP - US Dollars"].round(2)

### 2.8 Merging Dataframes

In [9]:
# merging gdp and crime by country name with left join
merged_data = pd.merge(crime_data, gdp_data, left_on=crime_data.Country, right_on="Country/Area", how="left")
# adding country codes by country name
merged_data = pd.merge(merged_data, codes_data, left_on=merged_data.Country, right_on=codes_data.Definition, how="left")
# drop double columns
merged_data.drop("Country/Area", axis=1, inplace=True)
merged_data.drop("key_0", axis=1, inplace=True)
merged_data.drop("Definition", axis=1, inplace=True)
# deleting Taiwan because it's considered as China
merged_data.drop(merged_data[merged_data['Code Value'] == 'TWN'].index, inplace=True)
# new xlsx file with merged data
merged_data.to_excel("data/merged_data.xlsx", index=False)

## 3. Data visualization

### 3.1 <u> GDP </u>

### 3.1.1 Countries with Highest and Lowest Economic Perfomance

In [10]:
gdp_column = 'GDP, Per Capita GDP - US Dollars'

# extracting countries with highest and lowest gdp
highest_gdp = merged_data.nlargest(10, gdp_column)
lowest_gdp = merged_data.nsmallest(10, gdp_column)

# creating bar chart
highest_fig = px.bar(highest_gdp, x='Country', y=gdp_column, color=gdp_column, # allocate axis
             color_continuous_scale='viridis_r',range_color=[0,250000], # color scale
             labels={'Country': 'Countries', gdp_column: 'GPD Per Capita in USD$'}) # labeling axis

lowest_fig = px.bar(lowest_gdp, x='Country', y=gdp_column, color=gdp_column,
             color_continuous_scale='solar', range_color=[300, 600],
             labels={'Country': 'Countries', gdp_column: 'GPD Per Capita in USD$'})

# showing value above bars
for i, wert in enumerate(highest_gdp[gdp_column]): # bar iteration
    highest_fig.add_annotation(x=highest_gdp['Country'].iloc[i], y=wert,
                       text=str(round(wert)), # converting value to text
                       showarrow=False, font=dict(size=13), yshift=9) # height and size
    
for i, wert in enumerate(lowest_gdp[gdp_column]):
    lowest_fig.add_annotation(x=lowest_gdp['Country'].iloc[i], y=wert,
                       text=str(round(wert)),
                    showarrow=False, font=dict(size=13), yshift=9)
# Layout-Anpassungen
highest_fig.update_layout(title="Top 10 Countries with the highest GDP Per Capita", # title
                  height=500, width=1250,  template='plotly_white') # size and design

lowest_fig.update_layout(title="Top 10 Countries with the lowest GDP Per Capita",
                  height=500, width=1250,  template='plotly_white')

# display charts
highest_fig.show()
lowest_fig.show()

### 3.1.2 Continental Economic Comparison

In [11]:
# count countries and group them by continent
count_countries = merged_data['Continent'].value_counts().reset_index()
count_countries.columns = ['Continent', 'Anzahl Länder']

# add up gdp and group countries by continent
count_gdp = merged_data.groupby('Continent')[gdp_column].sum().reset_index()
count_gdp.columns = ['Continent', 'Gesamt-GDP']

# merging both dataframes by contient
avg_continent_gdp = pd.merge(count_countries, count_gdp, on='Continent')

# new column with average gdp
avg_continent_gdp['Avg GDP'] = round(avg_continent_gdp['Gesamt-GDP'] / avg_continent_gdp['Anzahl Länder'],2)

avg_gdp = 'Avg GDP'

# creating bar chart
fig = px.bar(avg_continent_gdp, x='Continent', y=avg_gdp, color=avg_gdp,
             color_continuous_scale='viridis_r', 
             range_color=[0, 50000],
             labels={'Continent': 'Continent', avg_gdp: 'GPD Per Capita in USD$'})

# labeling bars
for i, wert in enumerate(avg_continent_gdp[avg_gdp]):
    fig.add_annotation(x=avg_continent_gdp['Continent'].iloc[i], y=wert,
                       text=str(round(wert)),
                       showarrow=False, font=dict(size=14), yshift=9)

# size, title and design
fig.update_layout(title="Continental GDP per Capita",
                  height=500, width=1250,  template='plotly_white')

# display chart
fig.show()

### 3.1.3 Global Economic Comparison

In [12]:
# new column with logarithmic gdp values
merged_data['log_gdp'] = np.log(merged_data[gdp_column])

# creating globe
fig = px.choropleth(merged_data, 
                    locations='Code Value', # assignment of countries with ISO3
                    color="log_gdp", # color scale
                    hover_name="Country", # name while hovering
                    color_continuous_scale="RdYlGn", # color scheme
                    hover_data= {gdp_column : True, "log_gdp": False, "Code Value" : False}, # hiding unnecessary values
                    projection='orthographic', # type of world map
                    title='GDP per Capita by Country in USD$', # title
                    template="plotly_white") # design

# adjusting size and chart legend
fig.update_layout(
    height=550, width=700,
    coloraxis_colorbar=dict(
        title="GDP per Capita",  # Titel der Farbskala
        tickvals=np.log([100, 1000, 10000, 100000]),  # Anpassung der Werte des Achsenticks
        ticktext=["0.1k", "1k", "10k", "100k"]  # Beschriftung des Achsenticks
    )
)

fig.show()

### <u> 3.2 Organized Crime Index </u>

### 3.2.1 Highest and Lowest Crime

In [13]:
crime = "Criminality"
# countries with highest and lowest crime
lowest_crime = merged_data.nsmallest(10, crime)
highest_crime = merged_data.nlargest(10, crime)

# creating bar chart
lowest_fig = px.bar(lowest_crime, x='Country', y=crime, color=crime,
             color_continuous_scale='viridis', 
             range_color=[1,3],
             labels={'Country': 'Countries', crime: 'Criminality'})

highest_fig = px.bar(highest_crime, x='Country', y=crime, color=crime,
             color_continuous_scale='solar_r', 
             range_color=[6,8],
             labels={'Country': 'Countries', crime: 'Criminality'})

# labeling bars
for i, yval in enumerate(lowest_crime[crime]):
    lowest_fig.add_annotation(x=i, y=yval +0.1,
                       text=str(yval),
                       showarrow=False)

for i, yval in enumerate(highest_crime[crime]):
    highest_fig.add_annotation(x=i, y=yval,
                       text=str(yval),
                       showarrow=False, font=dict(size=13), yshift=9)

# size, title and design
lowest_fig.update_layout(title="Countries with the lowest organized criminality",
                  height=500, width=1250,  template='plotly_white')

highest_fig.update_layout(title="Countries with the highest organized criminality",
                  height=500, width=1250,  template='plotly_white')


# display chart
lowest_fig.show()
highest_fig.show()

### 3.2.2 Global Criminality Comparison

In [14]:
# creating world map
crime_fig = px.choropleth(merged_data, 
                    locations='Code Value', # assignment of countries with ISO3
                    color="Criminality", # color scale
                    hover_name="Country", # name while hovering
                    color_continuous_scale="reds", # color scheme
                    hover_data= {gdp_column : True, "Criminality": False, "Code Value" : False}, # hiding unnecessary values
                    projection='natural earth', # type of world map
                    title='Organised crime per country', # title
                    template="plotly_white",  # design
                    height=500, width=1050) # size

resilience_fig = px.choropleth(merged_data, 
                    locations='Code Value',
                    color="Resilience", 
                    hover_name="Country", 
                    color_continuous_scale="teal", 
                    hover_data= {gdp_column : True, "Criminality": False, "Code Value" : False}, # Unnötige Hoverwerte entfernen
                    projection='natural earth', 
                    title='Resilience against crime per country', 
                    template="plotly_white",  
                    height=500, width=1050) 


crime_fig.show()
resilience_fig.show()

### <u> 3.3 Correlation </u>

### 3.3.1 Relationship between Economic Perfomance and Organized Crime

In [15]:
# scatter chart of criminality and logarithmic gdp
fig = px.scatter(merged_data, 
                 y="Criminality", # defining y-axis
                 x=gdp_column, # defining x-axis
                 log_x= True, # logarithmic gdp
                 color="Continent", # coloring dots by continent
                 hover_name="Country", # value while hovering
                 title="Relation between Criminality and GDP") #title

fig.show()

### 3.3.2 Correlation matrix

In [17]:
# selecting columns
columns = ["Criminality", "Resilience", gdp_column]
# correlation by spearman (because index ordinal)
correlation_matrix = merged_data[columns].corr(method="spearman").round(2)


# creating heatmap
fig = px.imshow(correlation_matrix,
                x=correlation_matrix.columns,
                y=correlation_matrix.columns,
                color_continuous_scale='RdBu',
                range_color=[-1,1],
                title='Correlation matrix',
                text_auto='True')
fig.show()

## 4. Conclusion

* There is a moderate negative correlation between organized crime and gross domestic product with r = -0.37
    * Means: Countries with higher economic performance tend to have low organized crime
* There is a strong positive correlation between crime resistance and gross domestic product with r = 0.73
    * Means: Countries with higher economic performance offer more resistance to organized crime
* There is a more moderate negative correlation between organized crime and resistance with r = -0.46
    * Means: Countries with a lot of resistance tend to have low organised crime