# Bayesian Statistics

This notebook explores how to download a dataset, in .csv format, from a URL and store it on your local drive (in our case it uses the Colab temp folder)

Our references are:
* [Think Bayes: Bayesian Statistics in Python, 2nd Edition - Allen B. Downey - Github](http://allendowney.github.io/ThinkBayes2/index.html)
* [Pandas](https://pandas.pydata.org/), version 2.20, date: 20 Jan, 2024
* [Python](https://www.python.org/), version 3.12.1

Our dataset is:
* The University of Chicago's NORC dataset entitled [The General Social Survey](https://gss.norc.org/)

This script will accomplish the following tasks:
* Import the [os.path library (module)](https://docs.python.org/3/library/os.path.html)
* Create a function that uses the [urllib.request](https://docs.python.org/3/library/urllib.request.html) to open a URL
* The function uses the [urlretrive](https://docs.python.org/3/library/urllib.request.html#urllib.request.urlretrieve)
* The function uses local to copy the .csv file to the Google Colab temp folder (located on the center left side of the notebook when it's open in Colab)
* Import the Pandas library
* Import the dataset into Pandas
* Display the first five lines of the dataset
* Display the last five lines of the dataset

In [1]:
# Load the data file


# Import the os.path library (module)
from os.path import basename, exists

# Create a function that uses the urllib.request to open a URL and urlretrieve
def download(url):
    filename = basename(url)
    if not exists(filename):
        from urllib.request import urlretrieve
        local, _ = urlretrieve(url, filename)
        print('Downloaded ' + local)

# Target URL
download('https://github.com/AllenDowney/ThinkBayes2/raw/master/data/gss_bayes.csv')

Downloaded gss_bayes.csv


Our next script will:
* Import Pandas
* Import the .csv file into Pandas
* Display the first five lines of the dataset

In [3]:
# Import dataset into Pandas
import pandas as pd

# Import the dataset into Pandas
gss = pd.read_csv('gss_bayes.csv')

# Display the first five (default setting) lines of the dataset
gss.head()

Unnamed: 0,caseid,year,age,sex,polviews,partyid,indus10
0,1,1974,21.0,1,4.0,2.0,4970.0
1,2,1974,41.0,1,5.0,0.0,9160.0
2,5,1974,58.0,2,6.0,1.0,2670.0
3,6,1974,30.0,1,5.0,4.0,6870.0
4,7,1974,48.0,1,5.0,4.0,7860.0


Our next script will display the last five (default setting) lines of the dataset

In [4]:
# Display the last five (default setting) lines of the dataset
gss.tail()

Unnamed: 0,caseid,year,age,sex,polviews,partyid,indus10
49285,2863,2016,57.0,2,1.0,0.0,7490.0
49286,2864,2016,77.0,1,6.0,7.0,3590.0
49287,2865,2016,87.0,2,4.0,5.0,770.0
49288,2866,2016,55.0,2,5.0,5.0,8680.0
49289,2867,2016,72.0,1,5.0,3.0,5170.0
