# DataFrames

### DataFrames are Pandas "tables" made up from columns and rows
* Each column of data in a DataFrame is a Pandas Series that shares tje same row index
* The column headers work as a column index that contains the Series names

### You can create a DataFrame from a Python dictionary or NumPy array by using the Pandas DataFrame() function

In [1]:
import numpy as np
import pandas as pd

In [2]:
''' 
It will be pretty unusual to create a DataFrame from a dictionary created in Python.
Typically you will load in a csv file or excel file.
This is just an example that DataFrames can be created many ways.
'''

pd.DataFrame(
    {"id": [1, 2],
     "store_nbr": [1, 2],
     "family": ["POULTRY", "PRODUCE"]
    }
)

Unnamed: 0,id,store_nbr,family
0,1,1,POULTRY
1,2,2,PRODUCE


## Create a DataFrame

In [15]:
oil = pd.read_csv('../DataFrames/oil.csv')
oil

Unnamed: 0,date,dcoilwtico
0,2013-01-01,
1,2013-01-02,93.14
2,2013-01-03,92.97
3,2013-01-04,93.12
4,2013-01-07,93.20
...,...,...
1213,2017-08-25,47.65
1214,2017-08-28,46.40
1215,2017-08-29,46.46
1216,2017-08-30,45.96


In [None]:
oil.shape # show the shape. 1218 rows and 2 columns

In [None]:
oil.index # shows the index

In [None]:
oil.columns # shows the column names

In [None]:
oil.axes # display the two axes which is colums and index

In [None]:
oil.dtypes # datatypes of columns

## Exploring a DataFrame

* You can explore a DateFrame using these mothods.

In [13]:
retail = pd.read_csv('../DataFrames/retail_2016_2017.csv')
retail

Unnamed: 0,id,date,store_nbr,family,sales,onpromotion
0,1945944,2016-01-01,1,AUTOMOTIVE,0.000,0
1,1945945,2016-01-01,1,BABY CARE,0.000,0
2,1945946,2016-01-01,1,BEAUTY,0.000,0
3,1945947,2016-01-01,1,BEVERAGES,0.000,0
4,1945948,2016-01-01,1,BOOKS,0.000,0
...,...,...,...,...,...,...
1054939,3000883,2017-08-15,9,POULTRY,438.133,0
1054940,3000884,2017-08-15,9,PREPARED FOODS,154.553,1
1054941,3000885,2017-08-15,9,PRODUCE,2419.729,148
1054942,3000886,2017-08-15,9,SCHOOL AND OFFICE SUPPLIES,121.000,8


### .head() and .tail() methods return the top or bottom ros in a DataFrame
* This is a great way to QA data upon import!

In [None]:
retail.head() # .head(nrows) returns the first n rows of the DataFrame (5 by default)

In [None]:
retail.tail() # .tail(nrows) Returns the last n rows of the DataFrame (5 by default)

The .sample() method returns a random sample of rows from a DataFrame

In [None]:
retail.sample(5) # .sample(nrows, random_state=12345) Returns the n rows from a randome sample (1 by default). 
                 # You can also specify a random_state argument to create an identical sample in another body of work or keep this one consistant

### The .info() method returns details on a DataFrame's properties and memory usage

In [None]:
retail.info() # .info() Returns key details on DataFrame size, columns, and memory usage

The .info() method will show non-null counts on a DataFrame with less than ~1.7 million rows,
but you can specify show_count=True to ensure they are always displayed.

This is a great way to quickly idenify missing values - if the non-null count is less than the total number of rows,
then the difference is the number of NaN values in that column!

### The .describe() method returns key statistics on a DataFrame;s columns


In [None]:
retail.describe() # .describe() returns descriptive statistics for the columns in a DataFrame 
               # (only numeric columns by default; use the include=" " argument to specify more columns
               # you can also use .round() to suppress scientific notation to display more readable numbers

In [None]:
retail.describe(include="all").round() # you can also use .round() to suppress scientific notation to display more readable numbers