# Reading in a Dataset and Gathering Basic Information
In this lecture, we will cover how to read in CSV data. CSVs store tabular data organized in rows and columns where, typically, each row is an observation and each column is a variable that you collected data on. The data frames that we've been building from scratch in lectures preceding this one are in a tabular format.

There are other common types of files:
- Excel files (which are also tabular data)
- Shapefiles (for geographic and spatial data)
- Columnar files (similar to tabular data but more efficient to store)

These other files can easily be worked with in R or Python. We will revisit these file types later in this course.

Today, after we read in our CSV, we will gather basic information about the dataset. We will also discuss basic functions for inspecting dataset properties, dimensions, data types, and summary statistics. Additionally, we will introduce read-write functions, discuss the cost of holding data in RAM, checking resource allocation, and explore lazy load options. 


In [1]:
# first, some quick housekeeping
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(readr))
suppressPackageStartupMessages(library(vroom))
suppressPackageStartupMessages(library(ggplot2))


# install libraries if needed
if (!require(dplyr)) install.packages("dplyr")
if (!require(readr)) install.packages("readr")
if (!require(vroom)) install.packages("vroom")
if (!require(ggplot2)) install.packages("ggplot2")

#load libraries that we will use today
library(dplyr)
library(readr)
library(vroom)
library(ggplot2)


# Reading in CSV Data
There are a million ways to read in CSVs. Let's talk about a few

*insert here*

In [None]:
# in case of emergency with loading csv, uncomment the following line and run it
# df <- ggplot2::mpg

# Getting basic info about the data frame 

Here are some good ways to get basic information about a dataframe in R:

- `head()`: Displays the first few rows of the dataframe.
- `tail()`: Displays the last few rows of the dataframe.
- `dim()`: Returns the dimensions of the dataframe (number of rows and columns).
- `nrow()`: Returns the number of rows in the dataframe.
- `ncol()`: Returns the number of columns in the dataframe.
- `names()`: Returns the column names of the dataframe.
- `str()`: Displays the structure of the dataframe, including data types and a preview of the data.
- `summary()`: Provides summary statistics for each column in the dataframe.
- `glimpse()`: Similar to `str()`, but provides a more readable output (requires the dplyr package).

Lets run a few of these

*insert here*