# Exploratory Data Analysis: Hands on Data
Today we will be demonstrating the following key exploratory data analysis techniques using an example dataset:
**Agenda:**
1. Importing libraries & packages
2. Importing tabular data to a DataFrame
3. Inspecting DataFrame structure
4. Concatenation
5. Renaming columns
6. Exploring values
7. Handling NaNs and Nulls
8. Plotting



## The Data
Our example dataset is daily summaries of air quality data from Providence, RI. It will give you some experience with working with temporal data.

The Rhode Island Department of Environmental Management (RIDEM) and Rhode Island Department of Health (RIDOH) collects air quality data at several sites across Rhode Island. We will be examining data from one site at the Community of Rhode Island (CCRI) Liston Campus. Here's some background:

* The CCRI site is part of the EPA's *State or Local Air Monitoring Stations* (SLAMS) and *National Air Toxics Trends Sites* (NATTS) networks.
* A variety of air pollutants (particulate matter (PM), volatile organic carbon (VOCs),  polycyclic aromatic hydrocarbons (PAHs), carbonyls, black carbon) have been monitored at this site since 2005.
* A reference for some of the dataset's [field descriptions](https://aqs.epa.gov/aqsweb/airdata/FileFormats.html#_daily_summary_files).
* The data was obtained from the Environmental Protection Agency (EPA) [Air Quality Data website](https://www.epa.gov/outdoor-air-quality-data).
    <div>
    <img src="https://github.com/brown-ccv/ccv-bootcamp-python-2023/blob/main/notebooks/images/aq-site-info.png" width="400"/>
    </div>

We will use a subset of this data in the demonstrations below and give you a chance to work with a larger dataset during the hands-on lab.

*Links*
[EPA Air Quality Data Interactive Map](https://www.epa.gov/outdoor-air-quality-data/interactive-map-air-quality-monitors) - Data source
[RIDEM 2022 Annual Monitoring Report](https://dem.ri.gov/sites/g/files/xkgbur861/files/2023-01/airnet22.pdf) - More information about the site and other monitoring locations across the state.

---

## 1. Importing libraries & packages
Importing packages typically appears at the top of the file.
* `import <package_name>` is the most basic command
* The package can be imported with an alias to shorten verbosity. Common packages will often have a conventional alias.
```python
import pandas
pandas.read_csv(path)

# VS as an alias

import pandas as pd
pd.read_csv(path)
```



In [16]:
import pandas as pd  # Import pandas library as an alias of 'pd'
import matplotlib.pyplot as plt  # Import the sub-package pyplot from the matplotlib library as an alias of 'plt'
from pathlib import Path  # Import filesystem path package, for easier pathing to files and outputs

# Magic command for jupyter notebook to generate figures within the notebook
%matplotlib inline

## 2. Importing tabular data to a DataFrame
The pandas package reads tabular data into a data structure called a `DataFrame`. Some examples of read functions are below:
* [`pd.read_csv`](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) - Comma-delimited or other delimited files
* [`pd.read_fwf`](https://pandas.pydata.org/docs/reference/api/pandas.read_fwf.html#) - Fixed width files
* [`pd.read_excel`](https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html) - Microsoft excel files
* [`pd.read_sql`](https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html) - SQL query or database table
* See [pandas I/O documentation](https://pandas.pydata.org/docs/reference/io.html#input-output) for more examples

We will be working with the `pd.read_csv()` because our data is comma-delimited. This function defaults to read comma-delimited files, but can be used on any delimited text file when the seperator is specified.

A. To start we need specify the path to our data directory:
```
project
├── data
│   └── raw
│       └── monthly            <- Data is here
│
└── notebooks                  <- Our working directory is here
```
We will be using package `os` and `Path` from `pathlib` to create out directory path because it standardizes pathing between operating systems. Path separators are different between Unix (Mac & Linux; using `/`) and Windows (using `\`) operating systems. Avoiding full string paths makes the code universal.

In [3]:
# Initialize a Path object to the data directory 
# `..` indicates to go up one level from `notebooks`, our current working directory
data_path = Path('..', 'data')

## We extend the path to the monthly data 
path_to_monthly_data = data_path / 'raw' / 'monthly'
path_to_monthly_data2 = data_path.joinpath('raw', 'monthly')  # Alternative syntax for extending path

print(f'This is the monthly data directory: {path_to_monthly_data}')

This is the monthly data directory: ../data/raw/monthly


Using the path generated above, we will read the first month of data (January 2022).

In [4]:
# Read and save the DataFrame object to a variable 'df_2022_01'
df_2022_01 = pd.read_csv(path_to_monthly_data / 'daily_44_007_0022_2022_01.csv')

## Inspecting DataFrame Structure
Now that we have imported the data to a DataFrame. Some questions we are curious about:
1. Did it import correctly?
2. What does the table look like? Number of rows? Columns?
3. Do we need all the data we are importing?
4. Is the data in the correct format?

We can inspect the DataFrame object by looking at its **attributes** and using DataFrame **methods**.

Here are useful **attributes** of the dataframe
* `.shape`:  Table dimensions
* `.columns`:  Sequence of columns
* `.index`:  Sequence of row indexes/labels
* `.dtypes`: Data types by column

Here are a few useful **methods** to inspect a dataframe:
* `.head()`: Shows the first 5 rows--can change the number by supplying an integer.
* `.tail()`: Shows the last 5 rows--can change the number by supplying an integer.
* `.info()`: Combines several DataFrame attributes to one report.
* `.select_dtypes()`: Useful for viewing only columns of certain data types.


<div class="alert alert-block alert-info">
Python objects may have <b>attributes</b> and <b>methods</b>.

<b>attributes</b> - Are properties of the object type. Say that there is a `Person` object, the person's `favorite_food` is one of their attributes.
    
<b>methods</b> - Are functions bound to an object type. They often perform a process that uses the object's properties. A method of a `Person` object, could be `report_writing`.

Attributes and methods are accessed by using dot (`.`) connectors to the object. The difference is that methods have `()` at the end so arguments can be passed.
    
*Example*:
```Python
George.favorite_food  # Accessing an attribute
>>> 'Pho'
my_report = George.write_report(topic='favorite_food', pages=5)  # Calling a method
```
</div>

In [5]:
df_2022_01.head()

Unnamed: 0,State Code,County Code,Site Number,Parameter Code,POC,Latitude,Longitude,Datum,Parameter Name,Duration Description,...,AQI,Daily Criteria Indicator,Tribe Name,State Name,County Name,City Name,Local Site Name,Address,MSA or CBSA Name,Data Source
0,44,7,22,87101,1,41.807469,-71.412968,NAD83,"Particle Number, Total Count",1 HOUR,...,.,Y,,Rhode Island,Providence,Providence,CCRI Liston Campus ROOFTOP,"1 Hilton St, PROVIDENCE RI","Providence-Warwick, RI-MA",AQS Data Mart
1,44,7,22,61107,1,41.807469,-71.412968,NAD83,Std Dev Vt Wind Direction,1 HOUR,...,.,Y,,Rhode Island,Providence,Providence,CCRI Liston Campus ROOFTOP,"1 Hilton St, PROVIDENCE RI","Providence-Warwick, RI-MA",AQS Data Mart
2,44,7,22,62101,1,41.807469,-71.412968,NAD83,Outdoor Temperature,1 HOUR,...,.,Y,,Rhode Island,Providence,Providence,CCRI Liston Campus ROOFTOP,"1 Hilton St, PROVIDENCE RI","Providence-Warwick, RI-MA",AQS Data Mart
3,44,7,22,61104,1,41.807469,-71.412968,NAD83,Wind Direction - Resultant,1 HOUR,...,.,Y,,Rhode Island,Providence,Providence,CCRI Liston Campus ROOFTOP,"1 Hilton St, PROVIDENCE RI","Providence-Warwick, RI-MA",AQS Data Mart
4,44,7,22,84313,1,41.807469,-71.412968,NAD83,Black carbon PM2.5 STP,1 HOUR,...,.,Y,,Rhode Island,Providence,Providence,CCRI Liston Campus ROOFTOP,"1 Hilton St, PROVIDENCE RI","Providence-Warwick, RI-MA",AQS Data Mart


In [6]:
df_2022_01.tail()

Unnamed: 0,State Code,County Code,Site Number,Parameter Code,POC,Latitude,Longitude,Datum,Parameter Name,Duration Description,...,AQI,Daily Criteria Indicator,Tribe Name,State Name,County Name,City Name,Local Site Name,Address,MSA or CBSA Name,Data Source
738,44,7,22,62201,1,41.807469,-71.412968,NAD83,Relative Humidity,1 HOUR,...,.,Y,,Rhode Island,Providence,Providence,CCRI Liston Campus ROOFTOP,"1 Hilton St, PROVIDENCE RI","Providence-Warwick, RI-MA",AQS Data Mart
739,44,7,22,84313,1,41.807469,-71.412968,NAD83,Black carbon PM2.5 STP,1 HOUR,...,.,Y,,Rhode Island,Providence,Providence,CCRI Liston Campus ROOFTOP,"1 Hilton St, PROVIDENCE RI","Providence-Warwick, RI-MA",AQS Data Mart
740,44,7,22,61107,1,41.807469,-71.412968,NAD83,Std Dev Vt Wind Direction,1 HOUR,...,.,Y,,Rhode Island,Providence,Providence,CCRI Liston Campus ROOFTOP,"1 Hilton St, PROVIDENCE RI","Providence-Warwick, RI-MA",AQS Data Mart
741,44,7,22,61103,1,41.807469,-71.412968,NAD83,Wind Speed - Resultant,1 HOUR,...,.,Y,,Rhode Island,Providence,Providence,CCRI Liston Campus ROOFTOP,"1 Hilton St, PROVIDENCE RI","Providence-Warwick, RI-MA",AQS Data Mart
742,44,7,22,61104,1,41.807469,-71.412968,NAD83,Wind Direction - Resultant,1 HOUR,...,.,Y,,Rhode Island,Providence,Providence,CCRI Liston Campus ROOFTOP,"1 Hilton St, PROVIDENCE RI","Providence-Warwick, RI-MA",AQS Data Mart


In [7]:
df_2022_01.shape

(743, 34)

In [8]:
df_2022_01.columns

Index(['State Code', 'County Code', 'Site Number', 'Parameter Code', 'POC',
       'Latitude', 'Longitude', 'Datum', 'Parameter Name',
       'Duration Description', 'Pollutant Standard', 'Date (Local)', 'Year',
       'Day In Year (Local)', 'Units of Measure', 'Exceptional Data Type',
       'Nonreg Observation Count', 'Observation Count', 'Observation Percent',
       'Nonreg Arithmetic Mean', 'Arithmetic Mean',
       'Nonreg First Maximum Value', 'First Maximum Value',
       'First Maximum Hour', 'AQI', 'Daily Criteria Indicator', 'Tribe Name',
       'State Name', 'County Name', 'City Name', 'Local Site Name', 'Address',
       'MSA or CBSA Name', 'Data Source'],
      dtype='object')

In [9]:
df_2022_01.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 743 entries, 0 to 742
Data columns (total 34 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   State Code                  743 non-null    int64  
 1   County Code                 743 non-null    int64  
 2   Site Number                 743 non-null    int64  
 3   Parameter Code              743 non-null    int64  
 4   POC                         743 non-null    int64  
 5   Latitude                    743 non-null    float64
 6   Longitude                   743 non-null    float64
 7   Datum                       743 non-null    object 
 8   Parameter Name              743 non-null    object 
 9   Duration Description        743 non-null    object 
 10  Pollutant Standard          72 non-null     object 
 11  Date (Local)                743 non-null    object 
 12  Year                        743 non-null    int64  
 13  Day In Year (Local)         743 non

<div class="alert alert-block alert-warning">
What is an "object" dtype?

<b>Short Answer:</b> It is a column of string or mixed data types (e.g. string, ints, floats, etc). Typically object dtype columns from an imported CSV will be a column of strings.

<b>Long Answer:</b>  Pandas was built upon the numpy package on its backend. Numpy can only store information in an array where each value is encoded in the same number of bytes. Because strings can be of variable length, they do not conform to the fixed byte requirement. Instead Pandas creates an object array with pointers to the strings and  the pointers are of equal byte size. This is similar for columns with mixtures of data types.
</div>

In [10]:
# Inspect Numerical Fields
df_2022_01.select_dtypes(include=['int', 'float']).head(100)

Unnamed: 0,State Code,County Code,Site Number,Parameter Code,POC,Latitude,Longitude,Year,Day In Year (Local),Exceptional Data Type,Nonreg Observation Count,Observation Count,Observation Percent,Nonreg Arithmetic Mean,Arithmetic Mean,Nonreg First Maximum Value,First Maximum Value,First Maximum Hour,Tribe Name
0,44,7,22,87101,1,41.807469,-71.412968,2022,1,,,24,100.0,,7062.208333,,14300.000,17,
1,44,7,22,61107,1,41.807469,-71.412968,2022,1,,,24,100.0,,17.166667,,25.000,7,
2,44,7,22,62101,1,41.807469,-71.412968,2022,1,,,24,100.0,,48.958333,,54.000,15,
3,44,7,22,61104,1,41.807469,-71.412968,2022,1,,,24,100.0,,140.791667,,195.000,15,
4,44,7,22,84313,1,41.807469,-71.412968,2022,1,,,24,100.0,,0.458333,,1.250,1,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,44,7,22,17149,6,41.807469,-71.412968,2022,5,,,1,100.0,,1.130000,,1.130,0,
96,44,7,22,17204,6,41.807469,-71.412968,2022,5,,,1,100.0,,0.845000,,0.845,0,
97,44,7,22,17151,6,41.807469,-71.412968,2022,5,,,1,100.0,,0.211000,,0.211,0,
98,44,7,22,17150,6,41.807469,-71.412968,2022,5,,,1,100.0,,3.910000,,3.910,0,


In [11]:
# Inspect Object fields
df_2022_01.select_dtypes(include='object').head(100)

Unnamed: 0,Datum,Parameter Name,Duration Description,Pollutant Standard,Date (Local),Units of Measure,AQI,Daily Criteria Indicator,State Name,County Name,City Name,Local Site Name,Address,MSA or CBSA Name,Data Source
0,NAD83,"Particle Number, Total Count",1 HOUR,,2022-01-01,Count per cm^3,.,Y,Rhode Island,Providence,Providence,CCRI Liston Campus ROOFTOP,"1 Hilton St, PROVIDENCE RI","Providence-Warwick, RI-MA",AQS Data Mart
1,NAD83,Std Dev Vt Wind Direction,1 HOUR,,2022-01-01,Degrees Compass,.,Y,Rhode Island,Providence,Providence,CCRI Liston Campus ROOFTOP,"1 Hilton St, PROVIDENCE RI","Providence-Warwick, RI-MA",AQS Data Mart
2,NAD83,Outdoor Temperature,1 HOUR,,2022-01-01,Degrees Fahrenheit,.,Y,Rhode Island,Providence,Providence,CCRI Liston Campus ROOFTOP,"1 Hilton St, PROVIDENCE RI","Providence-Warwick, RI-MA",AQS Data Mart
3,NAD83,Wind Direction - Resultant,1 HOUR,,2022-01-01,Degrees Compass,.,Y,Rhode Island,Providence,Providence,CCRI Liston Campus ROOFTOP,"1 Hilton St, PROVIDENCE RI","Providence-Warwick, RI-MA",AQS Data Mart
4,NAD83,Black carbon PM2.5 STP,1 HOUR,,2022-01-01,Micrograms/cubic meter (25 C),.,Y,Rhode Island,Providence,Providence,CCRI Liston Campus ROOFTOP,"1 Hilton St, PROVIDENCE RI","Providence-Warwick, RI-MA",AQS Data Mart
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,NAD83,Fluorene (TSP) STP,24 HOUR,,2022-01-05,Nanograms/cubic meter (25 C),.,Y,Rhode Island,Providence,Providence,CCRI Liston Campus ROOFTOP,"1 Hilton St, PROVIDENCE RI","Providence-Warwick, RI-MA",AQS Data Mart
96,NAD83,Pyrene (TSP) STP,24 HOUR,,2022-01-05,Nanograms/cubic meter (25 C),.,Y,Rhode Island,Providence,Providence,CCRI Liston Campus ROOFTOP,"1 Hilton St, PROVIDENCE RI","Providence-Warwick, RI-MA",AQS Data Mart
97,NAD83,Anthracene (TSP) STP,24 HOUR,,2022-01-05,Nanograms/cubic meter (25 C),.,Y,Rhode Island,Providence,Providence,CCRI Liston Campus ROOFTOP,"1 Hilton St, PROVIDENCE RI","Providence-Warwick, RI-MA",AQS Data Mart
98,NAD83,Phenanthrene (TSP) STP,24 HOUR,,2022-01-05,Nanograms/cubic meter (25 C),.,Y,Rhode Island,Providence,Providence,CCRI Liston Campus ROOFTOP,"1 Hilton St, PROVIDENCE RI","Providence-Warwick, RI-MA",AQS Data Mart


**Back to our questions:**

1. Did it import correctly?
2. What does the table look like? Number of rows? Columns?
3. Do we need all the data we are importing?
4. Is the data in the correct format?

* There are many columns we could drop because they all have the same value such as: "Local Site Name" and "Address". We know we are only working with one site for this analysis so these columns don't provide much value. They are also long string fields that take up more memory. Dropping them would improve performance if this dataset gets really large.
<br>

* The date would be more useful as a datetime data type rather than as string. This will allow for filtering by time and other useful datetime operations.

We can supply additional arguments to the `read_csv` function to handle these specifications.

In [12]:
# Create a list of the columns we wish to keep
keep_cols = ['Parameter Code', 'POC', 'Parameter Name', 'Duration Description',
             'Pollutant Standard',
             'Date (Local)', 'Year', 'Day In Year (Local)', 'Units of Measure',
             'Exceptional Data Type',
             'Observation Count', 'Observation Percent', 'Arithmetic Mean', 'First Maximum Value',
             'First Maximum Hour', 'AQI', 'Daily Criteria Indicator', ]

# Read in the csv with additional arguments
df_2022_01_curated = pd.read_csv(path_to_monthly_data / 'daily_44_007_0022_2022_01.csv',
                                 usecols=keep_cols,  # Specify columns to keep
                                 parse_dates=['Date (Local)'],  # Specify column to parse as a date instead of string
                                 date_format='%Y-%m-%d',  # Specify the format of date strings
                                 )
df_2022_01_curated.head()

Unnamed: 0,Parameter Code,POC,Parameter Name,Duration Description,Pollutant Standard,Date (Local),Year,Day In Year (Local),Units of Measure,Exceptional Data Type,Observation Count,Observation Percent,Arithmetic Mean,First Maximum Value,First Maximum Hour,AQI,Daily Criteria Indicator
0,87101,1,"Particle Number, Total Count",1 HOUR,,2022-01-01,2022,1,Count per cm^3,,24,100.0,7062.208333,14300.0,17,.,Y
1,61107,1,Std Dev Vt Wind Direction,1 HOUR,,2022-01-01,2022,1,Degrees Compass,,24,100.0,17.166667,25.0,7,.,Y
2,62101,1,Outdoor Temperature,1 HOUR,,2022-01-01,2022,1,Degrees Fahrenheit,,24,100.0,48.958333,54.0,15,.,Y
3,61104,1,Wind Direction - Resultant,1 HOUR,,2022-01-01,2022,1,Degrees Compass,,24,100.0,140.791667,195.0,15,.,Y
4,84313,1,Black carbon PM2.5 STP,1 HOUR,,2022-01-01,2022,1,Micrograms/cubic meter (25 C),,24,100.0,0.458333,1.25,1,.,Y


In [13]:
df_2022_01_curated.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 743 entries, 0 to 742
Data columns (total 17 columns):
 #   Column                    Non-Null Count  Dtype         
---  ------                    --------------  -----         
 0   Parameter Code            743 non-null    int64         
 1   POC                       743 non-null    int64         
 2   Parameter Name            743 non-null    object        
 3   Duration Description      743 non-null    object        
 4   Pollutant Standard        72 non-null     object        
 5   Date (Local)              743 non-null    datetime64[ns]
 6   Year                      743 non-null    int64         
 7   Day In Year (Local)       743 non-null    int64         
 8   Units of Measure          743 non-null    object        
 9   Exceptional Data Type     0 non-null      float64       
 10  Observation Count         743 non-null    int64         
 11  Observation Percent       743 non-null    float64       
 12  Arithmetic Mean       

Great! We've cut down the number of columns and converted the date field to a datetime format!
Next lets see how we can add more data from other files.

## 3. Concatenation
So far we've worked with one month's worth of data. Let's see how we can combine DataFrames together.

We will be using the [`pd.concat`](https://pandas.pydata.org/docs/reference/api/pandas.concat.html) function to combine two or more DataFrames.


In [14]:
# Read in February data
df_2022_02_curated = pd.read_csv(path_to_monthly_data / 'daily_44_007_0022_2022_02.csv',
                                 usecols=keep_cols,  # Specify columns to keep
                                 parse_dates=['Date (Local)'],  # Specify column to parse as a date instead of string
                                 date_format='%Y-%m-%d',  # Specify the format of date strings
                                 )
list_df_to_concat = [df_2022_01_curated, df_2022_02_curated]  # Make a list of DataFrames 
df_combined = pd.concat(list_df_to_concat)  # Concatenate and save the result as a new DataFrame
df_combined.head()

Unnamed: 0,Parameter Code,POC,Parameter Name,Duration Description,Pollutant Standard,Date (Local),Year,Day In Year (Local),Units of Measure,Exceptional Data Type,Observation Count,Observation Percent,Arithmetic Mean,First Maximum Value,First Maximum Hour,AQI,Daily Criteria Indicator
0,87101,1,"Particle Number, Total Count",1 HOUR,,2022-01-01,2022,1,Count per cm^3,,24,100.0,7062.208333,14300.0,17,.,Y
1,61107,1,Std Dev Vt Wind Direction,1 HOUR,,2022-01-01,2022,1,Degrees Compass,,24,100.0,17.166667,25.0,7,.,Y
2,62101,1,Outdoor Temperature,1 HOUR,,2022-01-01,2022,1,Degrees Fahrenheit,,24,100.0,48.958333,54.0,15,.,Y
3,61104,1,Wind Direction - Resultant,1 HOUR,,2022-01-01,2022,1,Degrees Compass,,24,100.0,140.791667,195.0,15,.,Y
4,84313,1,Black carbon PM2.5 STP,1 HOUR,,2022-01-01,2022,1,Micrograms/cubic meter (25 C),,24,100.0,0.458333,1.25,1,.,Y


In [15]:
df_combined.tail()

Unnamed: 0,Parameter Code,POC,Parameter Name,Duration Description,Pollutant Standard,Date (Local),Year,Day In Year (Local),Units of Measure,Exceptional Data Type,Observation Count,Observation Percent,Arithmetic Mean,First Maximum Value,First Maximum Hour,AQI,Daily Criteria Indicator
694,43802,2,Dichloromethane,24 HOUR,,2022-02-28,2022,59,Parts per billion Carbon,,1,100.0,0.1,0.1,0,.,Y
695,43817,2,Tetrachloroethylene,24 HOUR,,2022-02-28,2022,59,Parts per billion Carbon,,1,100.0,0.0,0.0,0,.,Y
696,43824,2,Trichloroethylene,24 HOUR,,2022-02-28,2022,59,Parts per billion Carbon,,1,100.0,0.0,0.0,0,.,Y
697,45807,2,"1,4-Dichlorobenzene",24 HOUR,,2022-02-28,2022,59,Parts per billion Carbon,,1,100.0,0.0,0.0,0,.,Y
698,43372,2,Methyl tert-butyl ether,24 HOUR,,2022-02-28,2022,59,Parts per billion Carbon,,1,100.0,0.0,0.0,0,.,Y


In [None]:
df_2022_01_curated.query("`Parameter Code` == 88101").filter(
    ['Duration Description', 'Date (Local)', 'Arithmetic Mean', 'First Maximum Value',
     'First Maximum Hour'])

Let's take a look at the online documentation for this function. [`pd.read_csv`](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html)

 At the top is the function call signature:
>pandas.read_csv(filepath_or_buffer, *, sep=_NoDefault.no_default, delimiter=None, header='infer', ...)
* This demonstrates how to use the function in code with all the available arguments.
* There are two types of arguments: *Positional* and *Keyword*
    1. **Positional arguments** are listed first. They are required and need to be specified in order. In this example there is only one, `filepath_or_buffer`.
    2. **Keyword arguments** are listed after positional arguments and are optional. They have an `=` after the name to denote default values.

    Positional arguments do not need to be specified by name while keyword arguments must be specified by name.
    ```python
    # Both of these calls are acceptable
    pd.read_csv('data/raw/datafile.csv', sep=',')
    pd.read_csv(filepath_or_buffer='data/raw/datafile.csv', sep=',')
