# `pandas` Part 4: Grouping and Sorting

# Learning Objectives
## By the end of this tutorial you will be able to:
1. Group data with `groupby()`
2. Sort data with `sort_values()`
 

## Files Needed for this lesson: `winemag-data-130k-v2.csv`
>- Download this csv from Canvas prior to the lesson

## The general steps to working with pandas:
1. import pandas as pd
2. Create or load data into a pandas DataFrame or Series
3. Reading data with `pd.read_`
>- Excel files: `pd.read_excel('fileName.xlsx')`
>- Csv files: `pd.read_csv('fileName.csv')`
>- Note: if the file you want to read into your notebook is not in the same folder you can do one of two things:
>>- Move the file you want to read into the same folder/directory as the notebook
>>- Type out the full path into the read function
4. After steps 1-3 you will want to check out your DataFrame
>- Use `shape` to see how many records and columns are in your DataFrame
>- Use `head()` to show the first 5-10 records in your DataFrame

Type-along narration: https://youtu.be/gDDqmK5J5Ak

# Analytics Project Framework Notes
## A complete and thorough analytics project will have 3 main areas
1. Descriptive Analytics: tells us what has happened or what is happening. 
>- The focus of this lesson is how to do this in python.
>- Many companies are at this level but not much more than this
>- Descriptive statistics (mean, median, mode, frequencies)
>- Graphical analysis (bar charts, pie charts, histograms, box-plots, etc)
2. Predictive Analytics: tells us what is likely to happen next
>- Less companies are at this level but are slowly getting there
>- Predictive statistics ("machine learning (ML)" using regression, multi-way frequency analysis, etc)
>- Graphical analysis (scatter plots with regression lines, decision trees, etc)
3. Prescriptive Analytics: tells us what to do based on the analysis
>- Synthesis and Report writing: executive summaries, data-based decision making
>- No analysis is complete without a written report with at least an executive summary
>- Communicate results of analysis to both non-technical and technical audiences

# Descriptive Analytics Using `pandas`

# Initial set-up steps
1. import modules and check working directory
2. Read data in
3. Check the data

# Step 2 Read Data Into a DataFrame with `read_csv()`
>- file name: `winemag-data-130k-v2.csv`
>- Set the index to column 0

### Check how many rows, columns, and data points are in the `wine_reviews` DataFrame
>- Use `shape` and indices to define variables
>- We can store the values for rows and columns in variables if we want to access them later

### Check a couple of rows of data

# Descriptive Analytics with `groupby()`
>- General syntax: dataFrame.groupby(['fields to group by']).fieldsToanalyze.aggregation

### Now, what is/are the question(s) being asked of the data? 
>- All analytics projects start with questions (from you, your boss, some decision maker, etc)

###  How many wines have been rated at each point value?

### How much does the least expensive wine for each point rating cost? 

### Question: How much does the most expensive wine for each point rating cost?

### What is the overall maximum price for all wines?

### What is the lowest price for a wine rating of 100?

### What is the highest price for a wine rating of 80? 

### What is the maximum rating for each country? 

### What is the maximum rating for China?

##### Another way to get maximum ratring for China combining `where` and `groupby`

### What are some summary stats for price for each country? 
>- Using the `agg()` function for specific summary stats
>>- What is the sample size?
>>- What is the minimum?
>>- What is the maximum?
>>- What is the mean?
>>- What is the median?
>>- What is the standard deviation? 

## What are the descriptive analytics for country and province? 
>- We can group by multiple fields by adding more to our groupby() function

## What are the descriptive price analytics for the US?
>- Add `get_group()` syntax

## What are the summary wine rating stats for Colorado? 
>- Note that states are coded in this dataset under province

# Sorting Results
>- Add sort_values() syntax
>- Default is ascending order
## What are the summary stats for points for each country?
>- Sort the results from lowest to highest mean points

### To sort in descending order...
>- Use ascending = False