## Importing libraries & packages
Importing packages typically appears at the top of the file.
* `import <package_name>` is the most basic command
* The package can be imported with an alias to shorten verbosity. Common packages will often have a conventional alias.
<blockquote>

```python
import pandas
pandas.read_csv(path)

# VS

import pandas as pd
pd.read_csv(path)
```
</blockquote>


In [6]:
import pandas as pd  # Import pandas library as an alias of 'pd'
import matplotlib.pyplot as plt  # Import the sub-package pyplot from the matplotlib library as an alias of 'plt'

%matplotlib inline  # Magic command for jupyter notebook to generate figures within the notebook

UsageError: unrecognized arguments: # Magic command for jupyter notebook to generate figures within the notebook


## Data
Today we will be using Providence, RI air quality data to demonstrate data exploratory data analysis techniques.

The Rhode Island Department of Environmental Management (RIDEM) and Rhode Island Department of Health (RIDOH) collects air quality data at several sites across Rhode Island. We will be examining data from the Community of Rhode Island (CCRI) Liston Campus site.

* The CCRI site is part of the EPA's *State or Local Air Monitoring Stations* (SLAMS) and *National Air Toxics Trends Sites* (NATTS) networks.
* A variety of air pollutants (particulate matter (PM), volatile organic carbon (VOCs),  polycyclic aromatic hydrocarbons (PAHs), carbonyls, black carbon) have been monitored at this site since 2005.
* The data was obtained from the Environmental Protection Agency (EPA) [Air Quality Data website](https://www.epa.gov/outdoor-air-quality-data).
<div>
<img src="images/aq-site-info.png" width="400"/>
</div>

We will use a subset of this data in the demonstrations below and give you a chance to work with a larger dataset during the hands-on lab.


*Links*
[EPA Air Quality Data Interactive Map](https://www.epa.gov/outdoor-air-quality-data/interactive-map-air-quality-monitors) - Data source
[RIDEM 2022 Annual Monitoring Report](https://dem.ri.gov/sites/g/files/xkgbur861/files/2023-01/airnet22.pdf) - More information about the site and other monitoring locations across the state.



## Reading files
The pandas package reads tabular data into a data structure called a `DataFrame`. Some examples of read functions are below:

* [`pd.read_csv`](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) - Comma-delimited or other delimited files
* [`pd.read_fwf`](https://pandas.pydata.org/docs/reference/api/pandas.read_fwf.html#) - Fixed width files
* [`pd.read_excel`](https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html) - Microsoft excel files
* [`pd.read_sql`](https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html) - SQL query or database table
* See [pandas I/O documentation](https://pandas.pydata.org/docs/reference/io.html#input-output) for more examples

Today we will be working with the `pd.read_csv()`. While this function defaults to read comma-delimited files, the function can be used on any delimited text file if provided the seperator as a keyword argument. Let's take a look at the online documentation for this function. [`pd.read_csv`](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html)

At the top is the function call signature:
>pandas.read_csv(filepath_or_buffer, *, sep=_NoDefault.no_default, delimiter=None, header='infer', ...)
* This demonstrates how to use the function in code with all the available arguments and keyword arguments.
* Required arguments are listed first. These are the argument names without an `=` sign. The function will not run without these arguments. Here there is only one, `filepath_or_buffer`.
* Keyword arguments are listed after required arguments and are optional. They have an `=` after the name to denote default values.

In [5]:
help(pd.read_csv)

Help on function read_csv in module pandas.io.parsers.readers:

read_csv(filepath_or_buffer: 'FilePath | ReadCsvBuffer[bytes] | ReadCsvBuffer[str]', *, sep: 'str | None | lib.NoDefault' = <no_default>, delimiter: 'str | None | lib.NoDefault' = None, header: "int | Sequence[int] | None | Literal['infer']" = 'infer', names: 'Sequence[Hashable] | None | lib.NoDefault' = <no_default>, index_col: 'IndexLabel | Literal[False] | None' = None, usecols=None, dtype: 'DtypeArg | None' = None, engine: 'CSVEngine | None' = None, converters=None, true_values=None, false_values=None, skipinitialspace: 'bool' = False, skiprows=None, skipfooter: 'int' = 0, nrows: 'int | None' = None, na_values=None, keep_default_na: 'bool' = True, na_filter: 'bool' = True, verbose: 'bool' = False, skip_blank_lines: 'bool' = True, parse_dates: 'bool | Sequence[Hashable] | None' = None, infer_datetime_format: 'bool | lib.NoDefault' = <no_default>, keep_date_col: 'bool' = False, date_parser=<no_default>, date_format: 'str

Wow. That's a lot. Let's break down the information.

The top of the docstring shows the function call signature
>


