# National Public Opinion Reference Survey (NPORS) Analysis

Jennifer Le  
12/6/24

The purpose of this analysis is to analyize public opinions on religion and the economy from 2020-2024 using data from NPORS surveys conducted anually since 2020.

Some questions this analysis will answer is 
1. How has public opinion on the economy changed since 2020
2. Add more questions here

[More information on NPORS surveys](https://www.pewresearch.org/methods/fact-sheet/national-public-opinion-reference-survey-npors/) 

## Import Datasets 

In [66]:
import pandas as pd
import seaborn as sns
%matplotlib inline
import matplotlib.pyplot as plt

# Save each .sav as a data frame
df_20 = pd.read_spss("Datasets/NPORS-2020/dataset.sav")
df_21 = pd.read_spss("Datasets/NPORS-2021/dataset.sav")
df_22 = pd.read_spss("Datasets/NPORS-2022/dataset.sav")
df_23 = pd.read_spss("Datasets/NPORS-2023/dataset.sav")
df_24 = pd.read_spss("Datasets/NPORS-2024/dataset.sav")

# convert .sav to .csv
# df_20.to_csv("Datasets/NPORS-2020/dataset.csv", index=False)
# df_21.to_csv("Datasets/NPORS-2021/dataset.csv", index=False)
# df_22.to_csv("Datasets/NPORS-2022/dataset.csv", index=False)
# df_23.to_csv("Datasets/NPORS-2023/dataset.csv", index=False)
# df_24.to_csv("Datasets/NPORS-2024/dataset.csv", index=False)

# Print the number of rows and columns 
print("(number of rows, number of columns)")
print("2020 survey: ", df_20.shape)
print("2021 survey: ", df_21.shape)
print("2022 survey: ", df_22.shape)
print("2023 survey: ", df_23.shape)
print("2024 survey: ", df_24.shape)

def remove_year_from_column_name(dataset, stop):
    list_columns = list(dataset.columns)

    for i in range(0,len(list_columns)-stop):
        list_columns[i] = list_columns[i][:-5]

    dataset.columns = list_columns

remove_year_from_column_name(df_20, 5)
remove_year_from_column_name(df_21, 2)

print(list(df_20.columns))
print(list(df_21.columns))

(number of rows, number of columns)
2020 survey:  (4108, 61)
2021 survey:  (3937, 66)
2022 survey:  (4043, 72)
2023 survey:  (5733, 80)
2024 survey:  (5626, 59)
['RESPID', 'LANGUAGE', 'INTERVIEW_START', 'INTERVIEW_END', 'ECON1MOD', 'ECON1BMOD', 'TYPOLOGYb', 'COVIDWORK_a', 'COVIDWORK_b', 'COMATTACH', 'VET1', 'VOL12_CPS', 'HLTHRATE', 'DISA', 'ROBWRK', 'RESPFUT', 'EMINUSE', 'INTMOB', 'INTFREQ', 'HOMEINTSERV', 'SMUSE_a', 'SMUSE_b', 'SMUSE_c', 'SMUSE_d', 'SMUSE_e', 'SMUSE_f', 'SMUSE_g', 'SMUSE_h', 'BOOKS', 'RADIO', 'DEVICE1a', 'SMART2', 'RELIG', 'BORN', 'ATTEND', 'RELIMP', 'PRAY', 'REG', 'INSURANCE', 'SEXASK', 'MARITAL', 'PARTY', 'PARTYLN', 'HISP', 'RACEMOD_1', 'RACEMOD_2', 'RACEMOD_3', 'RACEMOD_4', 'RACEMOD_5', 'RACEMOD_6', 'RACECMB', 'AGE', 'EDUC_ACS', 'NUMADULTS', 'NATIVITY', 'INCOME', 'REGION_NAME', 'MSA', 'MODE', 'BASEWEIGHT', 'WEIGHT']
['RESPID', 'MODE', 'INTERVIEW_START', 'INTERVIEW_END', 'DATERECEIVED', 'LANG', 'STRATUM', 'ECON1MOD', 'ECON1BMOD', 'TYPOLOGYb', 'GOVSIZE1', 'SOCTRUST',

## Clean and Aggregate Data

The columns we want are   
ECON1MOD  
ECON1BMOD
