In [1]:
# To use `Pandas`, you'll typically start with the following line of code.
import pandas as pd

## Creating Data
There are two core objects in pandas: the `DataFrame` and the `Series`

#### DataFrame
A **DataFrame is a table**.\
It contains an array of individual entries, each of which has a certain value.\
Each entry correspoinds to a row (or record) and a column.

In [2]:
pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})

Unnamed: 0,Yes,No
0,50,131
1,21,2


In [3]:
pd.DataFrame({'Apples': [30], 'Bananas': [21]})

Unnamed: 0,Apples,Bananas
0,30,21


In [4]:
pd.DataFrame({'Bob': ['I like it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']})

Unnamed: 0,Bob,Sue
0,I like it.,Pretty good.
1,It was awful.,Bland.


We are using the `pd.DataFrame()` constructor to generate these DataFrame objects.\
The syntax for declaring a new one is a dictionary whose keys are the column names (`Bob` and `Sue` in this example), and\
whose values are a list of entries. This is the standard way of constructing a new DataFrame, and the one you\
are most likely to encounter.

### Row lables
The list of row labels used in a DataFrame is known as an `Index`.\
We can assign values to it by using an `index` parameter in our constructor:

In [5]:
pd.DataFrame({
    'Bob': ['I liked it.', 'It was awful.'],
    'Sue': ['Pretty good.', 'Bland.']},
    index=['Product A', 'Product B'])

Unnamed: 0,Bob,Sue
Product A,I liked it.,Pretty good.
Product B,It was awful.,Bland.


### Series
A series, by contrast, is a sequence of data values.\
If a DataFrame is a table, a Series is a list.\
And in fact you can create one with nothing more than al list:

In [7]:
pd.Series([1, 2, 3, 4, 5])

0    1
1    2
2    3
3    4
4    5
dtype: int64

A Series is, in essence, a single column of a DataFrame.\
So you can assign column values to the Series the same way as before, using an index parameter.\
However, a Series does not have a column name, it only has one overall name:

In [10]:
pd.Series([30, 35, 40], index=['2018 Sales', '2019 Sales', '2020 Sales'], name='Product A')

2018 Sales    30
2019 Sales    35
2020 Sales    40
Name: Product A, dtype: int64

### Reading Data files
Being able to create a DataFrame of Series by hand is handy.\
But, most of the time, we won't actually be creating our own data by hand. Instead, we'll be working with data that already exists.

In [28]:
house_reviews = pd.read_csv('../data/002/melb_data.csv')

house_reviews.shape # check how large the resulting DataFrame
# So our new DataFrame has 13580 records split across 21 different columns.
# That's almost 285000 entries!

(13580, 21)

We can examine the contents of the resultant DataFrame using the `head()` command, which grabs the first five rows:

In [29]:
house_reviews.head()

Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
0,Abbotsford,85 Turner St,2,h,1480000.0,S,Biggin,3/12/2016,2.5,3067.0,...,1.0,1.0,202.0,,,Yarra,-37.7996,144.9984,Northern Metropolitan,4019.0
1,Abbotsford,25 Bloomburg St,2,h,1035000.0,S,Biggin,4/02/2016,2.5,3067.0,...,1.0,0.0,156.0,79.0,1900.0,Yarra,-37.8079,144.9934,Northern Metropolitan,4019.0
2,Abbotsford,5 Charles St,3,h,1465000.0,SP,Biggin,4/03/2017,2.5,3067.0,...,2.0,0.0,134.0,150.0,1900.0,Yarra,-37.8093,144.9944,Northern Metropolitan,4019.0
3,Abbotsford,40 Federation La,3,h,850000.0,PI,Biggin,4/03/2017,2.5,3067.0,...,2.0,1.0,94.0,,,Yarra,-37.7969,144.9969,Northern Metropolitan,4019.0
4,Abbotsford,55a Park St,4,h,1600000.0,VB,Nelson,4/06/2016,2.5,3067.0,...,1.0,2.0,120.0,142.0,2014.0,Yarra,-37.8072,144.9941,Northern Metropolitan,4019.0


In [41]:
# house_reviews = pd.read_csv('../data/002/melb_data.csv', index_col=0)
house_reviews = pd.read_csv('../data/002/melb_data.csv', index_col='Rooms')

In [42]:
house_reviews.head()

Unnamed: 0_level_0,Suburb,Address,Type,Price,Method,SellerG,Date,Distance,Postcode,Bedroom2,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
Rooms,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
2,Abbotsford,85 Turner St,h,1480000.0,S,Biggin,3/12/2016,2.5,3067.0,2.0,1.0,1.0,202.0,,,Yarra,-37.7996,144.9984,Northern Metropolitan,4019.0
2,Abbotsford,25 Bloomburg St,h,1035000.0,S,Biggin,4/02/2016,2.5,3067.0,2.0,1.0,0.0,156.0,79.0,1900.0,Yarra,-37.8079,144.9934,Northern Metropolitan,4019.0
3,Abbotsford,5 Charles St,h,1465000.0,SP,Biggin,4/03/2017,2.5,3067.0,3.0,2.0,0.0,134.0,150.0,1900.0,Yarra,-37.8093,144.9944,Northern Metropolitan,4019.0
3,Abbotsford,40 Federation La,h,850000.0,PI,Biggin,4/03/2017,2.5,3067.0,3.0,2.0,1.0,94.0,,,Yarra,-37.7969,144.9969,Northern Metropolitan,4019.0
4,Abbotsford,55a Park St,h,1600000.0,VB,Nelson,4/06/2016,2.5,3067.0,3.0,1.0,2.0,120.0,142.0,2014.0,Yarra,-37.8072,144.9941,Northern Metropolitan,4019.0


### Save this DataFrame to disk as a csv file.

In [47]:
house_reviews.to_csv('ex_01_house_review.csv')