# `pandas` Part 1: this notebook is a first lesson on `pandas`
## The main objective of this tutorial is to introduce `pandas` and create some DataFrames
>- Pandas is one of, if not the, most popular modules for data analytics/science projects
>- We will pretty much be learning about pandas from here until the final 


# Learning Objectives
## By the end of this tutorial you will be able to:
1. Import the `pandas` module and give it an alias
2. Define a pandas DataFrame and Series
3. Create a pandas DataFrame from scratch
4. Create a pandas DataFrame by reading an Excel file
5. Create a pandas DataFrame by reading a csv file
6. Examine your DataFrames using the `shape` and `head()` functions

## Files Needed for this lesson: `winemag-data-130k-v2.csv`
>- Download this csv from Canvas prior to the lesson

## The general steps to working with pandas:
1. import pandas as pd
>- Note the `as pd` is optional but is a common alias used for pandas and makes writing the code a bit easier
2. Create or load data into a pandas DataFrame or Series
>- In practice, you will likely be loading more datasets than creating but we will learn both
3. Reading data with `pd.read_`
>- Excel files: `pd.read_excel('fileName.xlsx')`
>- Csv files: `pd.read_csv('fileName.csv')`
4. After steps 1-3 you will want to check out your DataFrame
>- Use `shape` to see how many records and columns are in your DataFrame
>- Use `head()` to show the first 5-10 records in your DataFrame

Narrated type-along videos are available:

- Part 1: https://youtu.be/g_Kou0MKl1M
- Part 2: https://youtu.be/m63O75SSoXA

# First, check your working directory
>- Your working directory is where you are "working"
>- In other words, where you are opening and saving files. 
>>- in this class, you jupyter notebooks

### Note:  if you have a lot of files like I do you might want to run a loop to find the one you want

# Step 1: Import pandas and give it an alias

# Step 2: Create a pandas `DataFrame`
## Definition: a `DataFrame` is a table
>- A `DataFrame` is nothing different than an Excel table or table in a SQL database
>- A `DataFrame` contains rows/records and columns

### Let's make a `DataFrame` in the next cell with the `DataFrame` function

### Notes on the previous example:
1. We use the `pd.DataFrame({})` constructor to create a DataFrame from scratch
2. Note we used dictionary syntax where the keys are the column names and the values are the lists of values for either  'Yes' or 'No' 
3. The numbers in the far left column are autogenerated index values
>- These values will uniquely identify every row/record in the DataFrame
>- We can specify our own index values with an index parameter after the dictionary
4. This is the most common way of constructing a DataFrame 

### Make another `DataFrame` with string data
>- Suppose we are collecting feedback on several products
>- We can store the data from various customers/reviewers with a DataFrame

### Now add our own index values instead of the auto-generated numbers

# Step 2 (part b) with `Series`
## Definition: a `Series` is a sequence of data values
>- Essentially a `Series` can be thought of a single column of a `DataFrame`
>>- And a `DataFrame` can be thought of as a bunch of `Series` appended together

### Let's make a `Series` or two in the next few cells

# Step 2 (part c) Read Data Into a DataFrame
>- Knowing how to create your own data can be useful
>- However, most of the time we will read data into a DataFrame from a csv or Excel file

## File Needed: `winemag-data-130k-v2.csv`
>- Make sure you download this file from Canvas and place in your working directory

### Read the csv file with `pd.read_csv('fileName.csv'`)

### Check how many rows/records and columns are in the the `wine_reviews` DataFrame
>- Use `shape`

### The output returned by `shape` tells us how many rows and columns are in our DataFrame
>- Number of rows: 129,971
>- Number of columns: 14

### Now view a sample of 5 rows of data with `head()`

### Notice how it looks like we have two index rows
>- This is because the csv file already had an index column but pandas did not automatically code that as the index
>- Similar to how we set the index in the DataFrames we created, we can set the `index_col` parameter when we read in data