## Crime and Guns: Gun laws and policies affect crime in 2025 or not?

We'll use the table from Wikipedia:
https://en.wikipedia.org/wiki/Overview_of_gun_laws_by_nation
and from 
https://worldpopulationreview.com/country-rankings/crime-rate-by-country

In [189]:
import requests
import pandas as pd
from io import StringIO
import numpy as np
import statsmodels.formula.api as smf
import statsmodels.stats.multicomp as multi 

This part saves the html to acces the same data listed in the notebook

In [117]:
url = 'https://worldpopulationreview.com/country-rankings/crime-rate-by-country'
response = requests.get(url)

with open('crime_rate.html', 'w', encoding='utf-8') as f:
    f.write(response.text)

In [118]:
url = 'https://en.wikipedia.org/wiki/Overview_of_gun_laws_by_nation'
response = requests.get(url)

with open('gun_law.html', 'w', encoding='utf-8') as f:
    f.write(response.text)

We'll read the crime index table with pandas

In [119]:
crime_all_tables = pd.read_html('crime_rate.html')
crime_all_tables[0].columns[2]
crime_df = crime_all_tables[0][['Country','Crime Index (Numbeo 2024) (1-100)â\x86\x93']]
crime_df.columns = ['Country','Crime Index']
crime_df

Unnamed: 0,Country,Crime Index
0,Venezuela,81.2
1,Papua New Guinea,79.7
2,Afghanistan,78.3
3,Haiti,77.9
4,South Africa,75.4
...,...,...
136,Oman,19.0
137,Taiwan,16.7
138,Qatar,16.0
139,United Arab Emirates,15.6


We'll read the law regulations table with pandas

In [175]:
law_all_tables = pd.read_html('gun_law.html')

law_df = law_all_tables[0][['Region','Good reason']][:-1].copy()
law_df.columns = ['Region','Good Reason','Personal Protection']

def is_yes(string):
    if 'yes' in str(string).lower():
        return 1
    else:
        return 0
law_df['Yes for PP'] = law_df['Personal Protection'].apply(is_yes)
law_df['Yes for GR'] = law_df['Good Reason'].apply(is_yes)
law_df['Yes for PP&GR'] = law_df['Yes for PP'].astype(str)+law_df['Yes for GR'].astype(str)
law_df['Yes for PP&GR'] = law_df['Yes for PP&GR'].astype(str)
law_df

Unnamed: 0,Region,Good Reason,Personal Protection,Yes for PP,Yes for GR,Yes for PP&GR
0,Afghanistan[12][law 1],Not for shotguns and antique firearms,Restricted,0,0,00
1,Albania[law 2],Yes – hunting and sport shooting,Proof of threat to life required,0,1,01
2,Algeria[13],Yes – hunting (restricted),No,0,1,01
3,Andorra[law 3],No (with exceptions)Exceptions ISSF-approved p...,Yes – home defense,1,0,10
4,Angola[14],Private security companies only,Private security companies only,0,0,00
...,...,...,...,...,...,...
216,Puerto Rico[law 94][154],No,Yes,1,0,10
217,American Samoa,Yes – plantation protection and hunting[N 24],No,0,1,01
218,Somaliland,Justification required for more than 1 gun of ...,Justification required for more than 1 gun of ...,0,0,00
219,U.S. Virgin Islands,Yes – farming and sport shooting,Yes (handguns only),1,1,11


United states doesn't have 'Yes' so we'll fix it manually

In [176]:
# Fix for the US
us_idx = law_df.index[law_df['Region'] == 'United States']

law_df.loc[us_idx, 'Yes for PP'] = 1
law_df.loc[us_idx, 'Yes for GR'] = 1
law_df.loc[us_idx, 'Yes for PP&GR'] = '11'

There are also some countries with different names, let's normilize them

In [177]:
def delete_sq(string):
    return string.split('[')[0].split(' (')[0].split(' -')[0].replace('Vietnam ','Vietnam')

law_df['Region'] = law_df['Region'].apply(delete_sq)


replacements = {
        "DR Congo": "Democratic Republic of the Congo",
        "Congo": "Republic of the Congo",
        "Cape Verde": "Cabo Verde",
        "Swaziland": "Eswatini",
        "St Vincent & Grenadines": "Saint Vincent and the Grenadines",
        "São Tomé and Príncipe": "Sao Tome and Principe",
        "Timor-Leste": "East Timor",
        "Gaza Strip": "Palestine",
        "West Bank": "Palestine",
        "Ivory Coast": "Côte d'Ivoire",
        "Micronesia": "Federated States of Micronesia"
    }

def normalize_country_name(country_name):    
    return replacements.get(country_name, country_name)

law_df['Region'] = law_df['Region'].apply(normalize_country_name)

law_df = law_df[law_df['Region'] != 'Region']

Merge two tables

In [178]:
df = pd.merge(crime_df,law_df,how='inner',left_on='Country',right_on='Region')

In [179]:
df = df[['Crime Index','Yes for PP','Yes for GR','Yes for PP&GR']]
df.columns = ['Crime_Index','Yes_for_PP','Yes_for_GR','Yes_for_PP&GR']

Let's compare their means

In [180]:
df.groupby('Yes_for_PP')['Crime_Index'].agg(['mean', 'std'])

Unnamed: 0_level_0,mean,std
Yes_for_PP,Unnamed: 1_level_1,Unnamed: 2_level_1
0,43.683333,15.681743
1,48.284483,14.245773


In [181]:
df.groupby('Yes_for_GR')['Crime_Index'].agg(['mean', 'std'])

Unnamed: 0_level_0,mean,std
Yes_for_GR,Unnamed: 1_level_1,Unnamed: 2_level_1
0,48.196825,16.679454
1,43.462025,13.714472


In [182]:
df.groupby('Yes_for_PP&GR')['Crime_Index'].agg(['mean', 'std'])

Unnamed: 0_level_0,mean,std
Yes_for_PP&GR,Unnamed: 1_level_1,Unnamed: 2_level_1
0,48.165,17.217709
1,39.609091,13.043573
10,48.252174,16.07796
11,48.305714,13.151425


Run OLS Regression model fit

In [125]:
model1 = smf.ols(formula='Crime_Index ~ C(Yes_for_PP)', data=df)
results1 = model1.fit()
print (results1.summary())

                            OLS Regression Results                            
Dep. Variable:            Crime_Index   R-squared:                       0.022
Model:                            OLS   Adj. R-squared:                  0.015
Method:                 Least Squares   F-statistic:                     3.180
Date:                Tue, 12 Aug 2025   Prob (F-statistic):             0.0767
Time:                        11:31:01   Log-Likelihood:                -586.10
No. Observations:                 142   AIC:                             1176.
Df Residuals:                     140   BIC:                             1182.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                         coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------
Intercept             43.6833      1

### Interpretation - Yes for PP

With **0.07** confidence, we *can't reject 1 hypothesis* that there is a difference in crime rate for countries with or without guns for personal protection, and we need to look for more variables to estimate the features of crime rate

In [186]:
model2 = smf.ols(formula='Crime_Index ~ C(Yes_for_GR)', data=df)
results2 = model2.fit()
print (results2.summary())

                            OLS Regression Results                            
Dep. Variable:            Crime_Index   R-squared:                       0.024
Model:                            OLS   Adj. R-squared:                  0.017
Method:                 Least Squares   F-statistic:                     3.446
Date:                Tue, 12 Aug 2025   Prob (F-statistic):             0.0655
Time:                        12:10:08   Log-Likelihood:                -585.96
No. Observations:                 142   AIC:                             1176.
Df Residuals:                     140   BIC:                             1182.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                         coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------------
Intercept             48.1968      1

### Interpretation - Yes for GR

With **0.07** confidence, we *can't reject 1 hypothesis* that there is a difference in crime rate for countries with or without guns for good reason, and we need to look for more variables to estimate the features of crime rate

Run Tukey test

In [191]:
mc1 = multi.MultiComparison(df['Crime_Index'], df['Yes_for_PP&GR'])
res1 = mc1.tukeyhsd()
print(res1.summary())

 Multiple Comparison of Means - Tukey HSD, FWER=0.05 
group1 group2 meandiff p-adj   lower    upper  reject
-----------------------------------------------------
    00     01  -8.5559 0.0456 -16.9946 -0.1172   True
    00     10   0.0872    1.0 -10.0209 10.1953  False
    00     11   0.1407    1.0  -8.7998  9.0812  False
    01     10   8.6431 0.1122  -1.2959  18.582  False
    01     11   8.6966  0.052  -0.0521 17.4454  False
    10     11   0.0535    1.0 -10.3148 10.4219  False
-----------------------------------------------------


### Interpretation - Yes for PP&GR

We came to an interesting conclusion. Although “Yes” for Personal Protection and “Yes” for Good Reason do not affect the crime rate, there is a statistically significant difference when there is “Yes” for Personal Protection but “No” for Good Reason!

## Important Note

The original datasets were not properly wrangled, and there are significant differences in the interpretation of gun laws. This notebook is intended for exercise purposes only and not for scientific research.