# Pandas Basics Practical (09 / 15 / 2020)

Steps:
1. Import numpy and pandas
2. Load the DataFrame (`pd.read_csv()`)
  - print the dimensions of the df `df.shape`
3. Clean the DataFrame
  - Set the dtype for columns (`.astype()`)
    - Numeric Types
    - Date Times
  - Drop columns that only have null values (`.isnull()`)
  - Set the index
    - Check whether the 'Province_State' is unique (`.values_counts()`)
    - If so, use this column as the index (`.set_index()`)
4. Run some simple operations
 1. Extract a column from the DataFrame (`[ ]` or `.loc[]`)
 2. Extract a row from the DataFrame (`.loc[]` and `.iloc[]`)
 3. Subset the DataFrame using several rows / columns (`.loc[]`)
 4. Find the State with the most recoveries
    1. Find the row with the max (`.max()`)
    2. Use `.loc` to extract that row
 5. Find the states within the first quartile for 'Active' cases
    1. Use `.quartile()` to find the first quartile range
    2. Use `.loc` to subset the dataset using the previous value
 6. Find the states within the third quartile for 'Confirmed' cases
 

## 1. Importing the Pandas Module

In [None]:
import pandas as pd
import numpy as np

## 2. Loading the DataFrame

Use the `pd.read_csv()` command with the provided url

Remember to save the result into a variable (normally `df`)

In [None]:
# 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports_us/09-13-2020.csv'

### Print the dimensions of the DataFrame using `.shape`

In [None]:
df.shape

## 3. Cleaning the DataFrame

### Setting the correct dtypes
Start by viewing the first 5 rows of the `df`. (`.head()`)

Look to see if there are any variables that look like their types are incorrect (FIPS is one)

Then print the types for each column (`df.dtypes`).
- 'object' generally implies a string
- Look for columns that might need to be changed

Changes:
- Convert any columns you identified in the previous step
- convert 'Last_Update' to a date_time type
  - `df['Last_Update'].astype('datetime64')`
- remember to replace the existing column with the new columns


### Drop any columns that only have null values

Use the shape of the df to determine which columns might fit this criteria

`df.isnull().sum()`

Set the index to the `'Province_State'` column.

First make sure that each state appears only once by using the `.value_counts()` function on the column

## 4. Run some simple operations
 1. Extract a column from the DataFrame (`[ ]` or `.loc[]`)


 2. Extract a row from the DataFrame (`.loc[]` and `.iloc[]`)


 3. Subset the DataFrame using several rows / columns (`.loc[]`)


 4. Find the State with the most recoveries
    1. Find the row with the max (`.max()`)
    2. Use `.loc` to extract that row


 5. Find the states within the first quartile for 'Active' cases
    1. Use `.quartile()` to find the first quartile range
    2. Use `.loc` to subset the dataset using the previous value
 

6. Find the states within the third quartile for 'Confirmed' cases

This step will require two filters (boolean expressions)