Abstract

This project examines whether the intensity of religious belief in a society has any meaningful relationship with its murder rate, focusing specifically on the year 2020. We combine homicide data with global religious composition measures to see if stronger religious adherence or practice is associated with higher or lower levels of violence. Our approach uses metrics such as percentage of population affiliated with a religion, frequency of religious observance, and indicators of religious freedom to quantify “religion intensity.” We also incorporate social factors like income inequality, education levels, and law-enforcement capacity to understand whether these variables strengthen or weaken the connection between religion and murder rates. The goal is not just to test for correlation, but to evaluate whether religion could plausibly play a causal role once broader societal conditions are accounted for. By comparing countries and regions, this analysis gives us a clearer picture of how religious commitment functions in modern societies and whether it meaningfully shapes patterns of violence.

Data (to be updated for sex crimes, economic crimes and Homocide)

- Grouping crimes 

Our project uses two main datasets: a global homicide dataset (including Sexual Crimes )for 2020 from United Nations Office of Drugs and Crime (UNODC) and an international religious composition dataset covering 2020 from PEW Research Center. These were combined to examine whether the intensity of religious belief or practice relates to murder rates across countries. Both datasets are real, publicly available, and meet the assignment requirement that our data be recent and verifiable.

The Homicides data provides the number of murders recorded in each country for the year 2020. Each row represents a country and includes fields such as total homicides and population size, allowing us to compute per-capita murder rates. This dataset gives us a consistent measure of violent crime across different regions, which is essential for comparing societies fairly.

The Religious Composition from 2020 file contains detailed counts of religious affiliation for every major world religion. For each country, it reports the number of people identifying with major religious groups (such as Christianity, Islam, Hinduism, Buddhism, folk religions, and the unaffiliated). Because the dataset includes multiple years, we restricted our analysis to 2020 to match the homicide data. From this file, we generated indicators of “religion intensity,” such as the percentage of the population adhering to any religion and the relative size of dominant religious groups. We also developed "Homocide Density" as a percentage per population.

We merged the two datasets using country names as the key, producing a single table that links murder rates with religious adherence levels. While the homicide data gives us the outcome we want to study, the religious composition data helps quantify how religious each society is. Allowing us to form the foundation of the analysis and allow us to explore whether differences in religious intensity correlate with differences in murder rates.

We also included additional societal indicators: such as income inequality and education levels; so that we can test whether religion still matters after accounting for other major factors.


In [None]:
import pandas as pd

df = pd.read_csv('Homicides.csv')
df.drop(['Region', 'Subregion', 'Dimension', 'Category', 'Year', 'Unit of measurement', 'Source'], axis=1, inplace=True)
df.head()

country = df.groupby(by='Country')['VALUE'].sum()
country

In [None]:
# Religion
import numpy as np

religion = pd.read_csv('religion.csv')
religion.drop(['Region', 'Level', 'Countrycode'], axis=1, inplace=True)
religion.query('Year == 2020', inplace=True)

religion['Population'] = (
    religion['Population']
    .astype(str)
    .str.replace(',', '', regex=False)         
)
religion['Religiously_unaffiliated'] = (
    religion['Religiously_unaffiliated']
    .astype(str)
    .str.replace(',', '', regex=False)
)

religion['Religion Density'] = 1 - (religion['Religiously_unaffiliated'].astype(int) / religion['Population'].astype(int))
religion

In [None]:
# Sex
sex = pd.read_csv('Sex.csv')
sex.drop(['Iso3_code', 'Region', 'Subregion', 'Indicator', 'Dimension', 'Category'], axis=1, inplace=True)
sex = sex.groupby(by='Country')['VALUE'].sum()
sex

In [None]:
# Corruption 
corruption = pd.read_csv('Corruption.csv')
corruption.query('`Unit of measurement` == "Counts"', inplace=True)
corruption = corruption.groupby(by='Country')['VALUE'].sum()
corruption

In [62]:
# Merging
merged_df = pd.merge(religion, country, how='inner', on=['Country'])
merged_df = pd.merge(merged_df, sex, how='inner', on=['Country'], suffixes=('_hom', '_sex'))
merged_df = pd.merge(merged_df, corruption, how='inner', on=['Country'])
merged_df['Homicide Density'] = merged_df['VALUE_hom'].astype(int) / merged_df['Population'].astype(int) * 100
merged_df['Sex Assault Density'] = merged_df['VALUE_sex'].astype(int) / merged_df['Population'].astype(int) * 100
merged_df = merged_df.rename(columns={'VALUE': 'VALUES_corr'})
merged_df['Corruption Density'] = merged_df['VALUES_corr'].astype(int) / merged_df['Population'].astype(int) * 100
# merged_df.sort_values(by='Homicide Density', ascending=False)
merged_df.head()


Unnamed: 0,Country,Year,Population,Christians,Muslims,Religiously_unaffiliated,Buddhists,Hindus,Jews,Other_religions,Religion Density,VALUE_hom,VALUE_sex,VALUES_corr,Homicide Density,Sex Assault Density,Corruption Density
0,Albania,2020,2871954,511657,2139813,219787,5,20,289,382,0.923471,394.0,1055,7276.0,0.013719,0.036735,0.253347
1,Algeria,2020,44042091,129920,43329641,557664,6607,0,57,18202,0.987338,5353.0,11948,4208.0,0.012154,0.027129,0.009554
2,Argentina,2020,45191965,39974074,419922,4172533,14038,1163,173979,436255,0.907671,24582.0,191104,229808.0,0.054395,0.422872,0.508515
3,Armenia,2020,2890893,2813205,7712,34051,309,216,103,35296,0.988221,248.0,412,2279.0,0.008579,0.014252,0.078834
4,Australia,2020,25743791,12035331,901843,10900956,672312,762897,108230,362221,0.57656,1506.0,111312,678150.0,0.00585,0.432384,2.634227
