# An evaluation of peoples confidence in press and television in the US from 1998 to 2022  

## 1. Abstract

## 2. Introduction

In the modern era of dissinformation campaigns, fast moving news cycles and a sheer uncheckable amount of media, one would assume the press and television play pivotal roles in differentiating fact from fiction. As a matter of fact, its role in society has always been to shape public opinion, influence decision-making and foster discourse. [] 

In recent years, however we have witnessed significant transformations in the way information is produced, consumed, and perceived over the past decades []. Especially the trust in news outlets (be it print media or television) has seemingly declined, with people suspecting biases left and right []. Understanding the dynamics of public confidence in these mediums is crucial for assessing their societal impact and effectiveness as conduits of information.

In no other country is this as apparent as in the US. From fostering the breeding ground of Q-Anon and the resulting January 6th insurections [] to speculations about "Jewish Space Lasers" [], the US populus has stood out in recent years with its seeming information resistence.

This paper aims at evaluating people's confidence in the press and television in the United States, spanning the period from 1998 to 2022 according to the General Social Survey (GSS), comparing several key indicators such as gender, education level and party affiliation. 

## 3 Methodology

### 3.1 Data source
The primary dataset used in this study is the General Social Survey (GSS), a nationally representative survey conducted in the United States []. The GSS collects data on a wide range of social, demographic, and attitudinal variables, making it well-suited for examining trends in public opinion over time. I used the GSS data archives spanning the years 1998 to 2022 because this timeframe encapsulates many significant societal and technological changes. Specifically, 1998 is when Google first launched their search engine, making the web majority accessible, thus kickstarting a new era of information exchange [].

### 3.2 Data pre-processing
To pre-process the data, I used the Python programming language to download, unzip and prepare the Stata `dta` files. Utilizing the PyStata library, these individual datasets were then merged into a single large dataset encompassing the entire study period.

### 3.3 Variables of interest
The primary variables of interest in this study include measures of confidence in the press (`conpress`) and television (`contv`), as captured by survey questions within the GSS. Additionaly some demographic characteristics like gender (`sex`) and education level (`degree`) as well as political affiliation (`partyid`) and socio-economic status (`conrinc`) are also being included in my analysis.

## 4. Preparing the data

The pre-processing of the data is done in two steps. First, we need to download and extract the data. This can be done with plain Python. Next, we need to combine the individual Stata datasets into one single dataset. To achieve this, we will be using the `pandas` library as well as the `pystata` library.

### 4.1 Getting the data

To get started with retrieving the data, we need to import some necessary libraries and modules such as `os`, `zipfile` and `urllib`. If it doesn't exist, we also create a directory to store the survey data.

In [1]:
import os
import zipfile
from urllib.request import urlretrieve

data_path = "gss_data"
if not os.path.exists(data_path):
    os.mkdir(data_path)

Next, we define the URLs and use the urlretrieve function to download data files from the specified sources, ensuring they are accessible for subsequent analysis. The downloaded data files are then stored in the previously created directory.

In [2]:
dataset_urls = [
    ("https://gss.norc.org/documents/stata/1998_stata.zip", 1998),
    ("https://gss.norc.org/documents/stata/2000_stata.zip", 2000),
    ("https://gss.norc.org/documents/stata/2002_stata.zip", 2002),
    ("https://gss.norc.org/documents/stata/2004_stata.zip", 2004),
    ("https://gss.norc.org/documents/stata/2006_stata.zip", 2006),
    ("https://gss.norc.org/documents/stata/2008_stata.zip", 2008),
    ("https://gss.norc.org/documents/stata/2010_stata.zip", 2010),
    ("https://gss.norc.org/documents/stata/2012_stata.zip", 2012),
    ("https://gss.norc.org/documents/stata/2014_stata.zip", 2014),
    ("https://gss.norc.org/documents/stata/2016_stata.zip", 2016),
    ("https://gss.norc.org/documents/stata/2018_stata.zip", 2018),
    ("https://gss.norc.org/documents/stata/2021_stata.zip", 2021),
    ("https://gss.norc.org/documents/stata/2022_stata.zip", 2022)
]

dataset_zips = []
for (url, year) in dataset_urls:
    f, _ = urlretrieve(url, os.path.join(data_path, str(year) + ".zip"))
    dataset_zips.append((f, year))

Finally, we can unzip the `dta` files and remove the downloaded `zip` archives.

In [3]:
dataset_files = []
for (zip, year) in dataset_zips:
    with zipfile.ZipFile(zip) as z:
        file_to_extract = list(filter(lambda x: x.filename.endswith(".dta"), z.infolist()))[0]
        file_to_extract.filename = str(year) + ".dta"
        z.extract(file_to_extract, data_path)
        dataset_files.append(os.path.join(data_path, file_to_extract.filename))

    os.remove(zip)

### 4.2 Combining the datasets

In [4]:
from pystata import config

config.init("be", splash=False)

In [5]:
from pystata import stata
import pandas as pd

stata.run(f"use {dataset_files[0]}")

In [6]:
df1 = stata.pdataframe_from_data()

In [7]:
df1[["degree", "educ"]]

Unnamed: 0,degree,educ
0,1.0,12.0
1,2.0,17.0
2,1.0,12.0
3,1.0,13.0
4,3.0,16.0
...,...,...
2827,2.0,14.0
2828,1.0,12.0
2829,0.0,6.0
2830,1.0,12.0


In [None]:
pd.concat

## 5. Analyzing the data

### 5.1 Trust in press and TV by year

### 5.2 Trust in press and TV by year by gender

#### 5.2.1 Correlation

### 5.3 Trust in press and TV by year by education levels

#### 5.3.1 Correlation

### 5.4 Trust in press and TV by year by party affiliation

#### 5.4.1 Correlation

### 5.5 Trust in press and TV by year by income

#### 5.5.1 Correlation


## 6. Conclusion

## 7. Sources