<a href="https://colab.research.google.com/github/arun-arunisto/Machine_Learning_Tutorial/blob/todo/MachineLearningTutorial2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Pandas
- Creating, Reading, Writing
- Indexing, Selecting, Assigning
- Summary functions and maps
- Grouping and Sorting
- Data types and Missing Values
- Renaming and Combining

In [1]:
#importing pandas library
import pandas as pd

##Creating Data
- there are two core objects in pandas
1. Dataframe
2. Series

###1. Dataframe
- dataframe is a table
- it contains an array of individual entries, each of which has a certain value
- each entry corresponds row(or record) and column
- we are using <b>pd.DataFrame()</b> constructor to generate these dataframe

In [2]:
#sample code
pd.DataFrame({"Name":["Arun", "Arunisto"], "Age":[27, 28]})

Unnamed: 0,Name,Age
0,Arun,27
1,Arunisto,28


In [3]:
#and also we can customize our own index also
pd.DataFrame({"Name":["Arun", "Arunisto"], "Age":[27, 28]},
             index=['Person 1', 'Person 2'])

Unnamed: 0,Name,Age
Person 1,Arun,27
Person 2,Arunisto,28


###2. Series
- series, by contrast is a sequence of data values.
- if dataframe is a table "series" is a list
- <b>pd.Series()</b> is going to create the series elements

In [4]:
#sample code
pd.Series([1, 2, 3, 4, 5])

0    1
1    2
2    3
3    4
4    5
dtype: int64

- a series id, in essence, a single column of a dataframe
- so you can customize the index
- series dont have a column name, it has only one overall name

In [5]:
pd.Series([5.0, 4.7, 4.5, 3.2],
          index=["Review 1", "Review 2", "Review 3", "Review 4"],
          name="Oppenheimer Movie")

Review 1    5.0
Review 2    4.7
Review 3    4.5
Review 4    3.2
Name: Oppenheimer Movie, dtype: float64

##Reading data files
- we are using "csv" file to read

In [6]:
#reading csv file
house_data = pd.read_csv("/content/drive/MyDrive/Datascience&MachineLearning/datasets/melb_data.csv")


In [7]:
#shape - to check how large the resulting dataframe
house_data.shape

(13580, 21)

- 13580 records with 21 different columns

In [8]:
#head() - which grabs the first five rows
house_data.head()

Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
0,Abbotsford,85 Turner St,2,h,1480000.0,S,Biggin,3/12/2016,2.5,3067.0,...,1.0,1.0,202.0,,,Yarra,-37.7996,144.9984,Northern Metropolitan,4019.0
1,Abbotsford,25 Bloomburg St,2,h,1035000.0,S,Biggin,4/02/2016,2.5,3067.0,...,1.0,0.0,156.0,79.0,1900.0,Yarra,-37.8079,144.9934,Northern Metropolitan,4019.0
2,Abbotsford,5 Charles St,3,h,1465000.0,SP,Biggin,4/03/2017,2.5,3067.0,...,2.0,0.0,134.0,150.0,1900.0,Yarra,-37.8093,144.9944,Northern Metropolitan,4019.0
3,Abbotsford,40 Federation La,3,h,850000.0,PI,Biggin,4/03/2017,2.5,3067.0,...,2.0,1.0,94.0,,,Yarra,-37.7969,144.9969,Northern Metropolitan,4019.0
4,Abbotsford,55a Park St,4,h,1600000.0,VB,Nelson,4/06/2016,2.5,3067.0,...,1.0,2.0,120.0,142.0,2014.0,Yarra,-37.8072,144.9941,Northern Metropolitan,4019.0


- we can specify the index by using index_col

##Indexing, Selecting and Assigning

- native python objects provide good ways to indexing data

In [9]:
house_data

Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
0,Abbotsford,85 Turner St,2,h,1480000.0,S,Biggin,3/12/2016,2.5,3067.0,...,1.0,1.0,202.0,,,Yarra,-37.79960,144.99840,Northern Metropolitan,4019.0
1,Abbotsford,25 Bloomburg St,2,h,1035000.0,S,Biggin,4/02/2016,2.5,3067.0,...,1.0,0.0,156.0,79.0,1900.0,Yarra,-37.80790,144.99340,Northern Metropolitan,4019.0
2,Abbotsford,5 Charles St,3,h,1465000.0,SP,Biggin,4/03/2017,2.5,3067.0,...,2.0,0.0,134.0,150.0,1900.0,Yarra,-37.80930,144.99440,Northern Metropolitan,4019.0
3,Abbotsford,40 Federation La,3,h,850000.0,PI,Biggin,4/03/2017,2.5,3067.0,...,2.0,1.0,94.0,,,Yarra,-37.79690,144.99690,Northern Metropolitan,4019.0
4,Abbotsford,55a Park St,4,h,1600000.0,VB,Nelson,4/06/2016,2.5,3067.0,...,1.0,2.0,120.0,142.0,2014.0,Yarra,-37.80720,144.99410,Northern Metropolitan,4019.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13575,Wheelers Hill,12 Strada Cr,4,h,1245000.0,S,Barry,26/08/2017,16.7,3150.0,...,2.0,2.0,652.0,,1981.0,,-37.90562,145.16761,South-Eastern Metropolitan,7392.0
13576,Williamstown,77 Merrett Dr,3,h,1031000.0,SP,Williams,26/08/2017,6.8,3016.0,...,2.0,2.0,333.0,133.0,1995.0,,-37.85927,144.87904,Western Metropolitan,6380.0
13577,Williamstown,83 Power St,3,h,1170000.0,S,Raine,26/08/2017,6.8,3016.0,...,2.0,4.0,436.0,,1997.0,,-37.85274,144.88738,Western Metropolitan,6380.0
13578,Williamstown,96 Verdon St,4,h,2500000.0,PI,Sweeney,26/08/2017,6.8,3016.0,...,1.0,5.0,866.0,157.0,1920.0,,-37.85908,144.89299,Western Metropolitan,6380.0


- we are able to access the columns using "dot notation" like

In [10]:
#here we are accessing Suburb column
house_data.Suburb

0           Abbotsford
1           Abbotsford
2           Abbotsford
3           Abbotsford
4           Abbotsford
             ...      
13575    Wheelers Hill
13576     Williamstown
13577     Williamstown
13578     Williamstown
13579       Yarraville
Name: Suburb, Length: 13580, dtype: object

- and also we are able to access as a dictionary method using "[" "]" square brackets

In [11]:
#using [] and accessing Address column
house_data["Address"]

0            85 Turner St
1         25 Bloomburg St
2            5 Charles St
3        40 Federation La
4             55a Park St
               ...       
13575        12 Strada Cr
13576       77 Merrett Dr
13577         83 Power St
13578        96 Verdon St
13579          6 Agnes St
Name: Address, Length: 13580, dtype: object

- and we are able to take single data by using index positon

In [12]:
#using index position
house_data["Address"][0]

'85 Turner St'

###indexing in pandas
- in python default indexing operator to access elements that we above mentioned.
- and pandas has also its own accessor operators <b>loc</b> and <b>iloc</b>

In [13]:
#iloc - index based selection
house_data.iloc[0]

Suburb                      Abbotsford
Address                   85 Turner St
Rooms                                2
Type                                 h
Price                        1480000.0
Method                               S
SellerG                         Biggin
Date                         3/12/2016
Distance                           2.5
Postcode                        3067.0
Bedroom2                           2.0
Bathroom                           1.0
Car                                1.0
Landsize                         202.0
BuildingArea                       NaN
YearBuilt                          NaN
CouncilArea                      Yarra
Lattitude                     -37.7996
Longtitude                    144.9984
Regionname       Northern Metropolitan
Propertycount                   4019.0
Name: 0, dtype: object

- both "iloc" and "loc" are row-first, column-second
- this is opposite in python, which is column first row second
- this means that its marginally easier to retrieve rows, and marginally harder to get columns

In [None]:
house_data.iloc[:, 0] #it will get the only data of "suburb" column

0           Abbotsford
1           Abbotsford
2           Abbotsford
3           Abbotsford
4           Abbotsford
             ...      
13575    Wheelers Hill
13576     Williamstown
13577     Williamstown
13578     Williamstown
13579       Yarraville
Name: Suburb, Length: 13580, dtype: object

In [14]:
#next we are going to get 3 raws from first column
house_data.iloc[:3, 0]

0    Abbotsford
1    Abbotsford
2    Abbotsford
Name: Suburb, dtype: object

In [15]:
#or selecting the first and third entries
house_data.iloc[1:3, 0]

1    Abbotsford
2    Abbotsford
Name: Suburb, dtype: object

In [16]:
#it also possible to pass a list
house_data.iloc[[0, 1, 2], 0]

0    Abbotsford
1    Abbotsford
2    Abbotsford
Name: Suburb, dtype: object

In [17]:
#we can use negative index for starts from end
house_data[-5:] #it will return last 5 rows

Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
13575,Wheelers Hill,12 Strada Cr,4,h,1245000.0,S,Barry,26/08/2017,16.7,3150.0,...,2.0,2.0,652.0,,1981.0,,-37.90562,145.16761,South-Eastern Metropolitan,7392.0
13576,Williamstown,77 Merrett Dr,3,h,1031000.0,SP,Williams,26/08/2017,6.8,3016.0,...,2.0,2.0,333.0,133.0,1995.0,,-37.85927,144.87904,Western Metropolitan,6380.0
13577,Williamstown,83 Power St,3,h,1170000.0,S,Raine,26/08/2017,6.8,3016.0,...,2.0,4.0,436.0,,1997.0,,-37.85274,144.88738,Western Metropolitan,6380.0
13578,Williamstown,96 Verdon St,4,h,2500000.0,PI,Sweeney,26/08/2017,6.8,3016.0,...,1.0,5.0,866.0,157.0,1920.0,,-37.85908,144.89299,Western Metropolitan,6380.0
13579,Yarraville,6 Agnes St,4,h,1285000.0,SP,Village,26/08/2017,6.3,3013.0,...,1.0,1.0,362.0,112.0,1920.0,,-37.81188,144.88449,Western Metropolitan,6543.0


In [18]:
#loc - label based selection
house_data.loc[0, 'Suburb']

'Abbotsford'

- its the data index value not its position, which matters

In [19]:
house_data.loc[:, ['Suburb', 'Address', 'Rooms', 'Type']]

Unnamed: 0,Suburb,Address,Rooms,Type
0,Abbotsford,85 Turner St,2,h
1,Abbotsford,25 Bloomburg St,2,h
2,Abbotsford,5 Charles St,3,h
3,Abbotsford,40 Federation La,3,h
4,Abbotsford,55a Park St,4,h
...,...,...,...,...
13575,Wheelers Hill,12 Strada Cr,4,h
13576,Williamstown,77 Merrett Dr,3,h
13577,Williamstown,83 Power St,3,h
13578,Williamstown,96 Verdon St,4,h


- "iloc" conceptfully simpler than "loc"
- it ignores the dataset's indices
- when we use "iloc" we treat dataset like a big matrix a list of lists
- "loc", by contrast uses the information in the indices to do its work.

###Manipulating the index
- label-based selection derives its power from the labels in the index.
- the index we use is not immutable
- we can manipulate the index in any way we see it

In [20]:
#set_index() method - is used to do the job
house_data.set_index('SellerG')

Unnamed: 0_level_0,Suburb,Address,Rooms,Type,Price,Method,Date,Distance,Postcode,Bedroom2,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
SellerG,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
Biggin,Abbotsford,85 Turner St,2,h,1480000.0,S,3/12/2016,2.5,3067.0,2.0,1.0,1.0,202.0,,,Yarra,-37.79960,144.99840,Northern Metropolitan,4019.0
Biggin,Abbotsford,25 Bloomburg St,2,h,1035000.0,S,4/02/2016,2.5,3067.0,2.0,1.0,0.0,156.0,79.0,1900.0,Yarra,-37.80790,144.99340,Northern Metropolitan,4019.0
Biggin,Abbotsford,5 Charles St,3,h,1465000.0,SP,4/03/2017,2.5,3067.0,3.0,2.0,0.0,134.0,150.0,1900.0,Yarra,-37.80930,144.99440,Northern Metropolitan,4019.0
Biggin,Abbotsford,40 Federation La,3,h,850000.0,PI,4/03/2017,2.5,3067.0,3.0,2.0,1.0,94.0,,,Yarra,-37.79690,144.99690,Northern Metropolitan,4019.0
Nelson,Abbotsford,55a Park St,4,h,1600000.0,VB,4/06/2016,2.5,3067.0,3.0,1.0,2.0,120.0,142.0,2014.0,Yarra,-37.80720,144.99410,Northern Metropolitan,4019.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Barry,Wheelers Hill,12 Strada Cr,4,h,1245000.0,S,26/08/2017,16.7,3150.0,4.0,2.0,2.0,652.0,,1981.0,,-37.90562,145.16761,South-Eastern Metropolitan,7392.0
Williams,Williamstown,77 Merrett Dr,3,h,1031000.0,SP,26/08/2017,6.8,3016.0,3.0,2.0,2.0,333.0,133.0,1995.0,,-37.85927,144.87904,Western Metropolitan,6380.0
Raine,Williamstown,83 Power St,3,h,1170000.0,S,26/08/2017,6.8,3016.0,3.0,2.0,4.0,436.0,,1997.0,,-37.85274,144.88738,Western Metropolitan,6380.0
Sweeney,Williamstown,96 Verdon St,4,h,2500000.0,PI,26/08/2017,6.8,3016.0,4.0,1.0,5.0,866.0,157.0,1920.0,,-37.85908,144.89299,Western Metropolitan,6380.0


- this is useful when dataset index which is better than current one

###Conditional Selection


In [21]:
#we are going to check that SellerG is 'Biggin'
house_data.SellerG == 'Biggin'

0         True
1         True
2         True
3         True
4        False
         ...  
13575    False
13576    False
13577    False
13578    False
13579    False
Name: SellerG, Length: 13580, dtype: bool

- this wil form a True/False based data,
we can add this information on "loc" to select relevant data

In [22]:
house_data.loc[house_data.SellerG == 'Biggin']

Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
0,Abbotsford,85 Turner St,2,h,1480000.0,S,Biggin,3/12/2016,2.5,3067.0,...,1.0,1.0,202.0,,,Yarra,-37.79960,144.99840,Northern Metropolitan,4019.0
1,Abbotsford,25 Bloomburg St,2,h,1035000.0,S,Biggin,4/02/2016,2.5,3067.0,...,1.0,0.0,156.0,79.0,1900.0,Yarra,-37.80790,144.99340,Northern Metropolitan,4019.0
2,Abbotsford,5 Charles St,3,h,1465000.0,SP,Biggin,4/03/2017,2.5,3067.0,...,2.0,0.0,134.0,150.0,1900.0,Yarra,-37.80930,144.99440,Northern Metropolitan,4019.0
3,Abbotsford,40 Federation La,3,h,850000.0,PI,Biggin,4/03/2017,2.5,3067.0,...,2.0,1.0,94.0,,,Yarra,-37.79690,144.99690,Northern Metropolitan,4019.0
8,Abbotsford,6/241 Nicholson St,1,u,300000.0,S,Biggin,8/10/2016,2.5,3067.0,...,1.0,1.0,0.0,,,Yarra,-37.80080,144.99730,Northern Metropolitan,4019.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13480,Maidstone,3 Wallace St,3,h,735000.0,S,Biggin,26/08/2017,6.4,3012.0,...,1.0,1.0,167.0,110.0,2010.0,,-37.78817,144.87873,Western Metropolitan,3873.0
13497,Moonee Ponds,53 Moore St,4,h,1305000.0,SP,Biggin,26/08/2017,6.2,3039.0,...,1.0,1.0,449.0,129.0,1910.0,,-37.77081,144.92074,Western Metropolitan,6232.0
13505,Mulgrave,2229 Dandenong Rd,3,h,940000.0,S,Biggin,26/08/2017,18.8,3170.0,...,2.0,3.0,709.0,140.0,1958.0,,-37.92811,145.14971,South-Eastern Metropolitan,7113.0
13523,Prahran,69 Greville St,4,h,2668000.0,S,Biggin,26/08/2017,4.6,3181.0,...,2.0,2.0,383.0,,,,-37.84879,144.98882,Southern Metropolitan,7717.0


In [23]:
#and also we are able to check multiple data
#here we are checking the sellerg is biggin, and rooms will be more than 3
house_data.loc[(house_data.SellerG == 'Biggin') & (house_data.Rooms >= 3)]

Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
2,Abbotsford,5 Charles St,3,h,1465000.0,SP,Biggin,4/03/2017,2.5,3067.0,...,2.0,0.0,134.0,150.0,1900.0,Yarra,-37.80930,144.99440,Northern Metropolitan,4019.0
3,Abbotsford,40 Federation La,3,h,850000.0,PI,Biggin,4/03/2017,2.5,3067.0,...,2.0,1.0,94.0,,,Yarra,-37.79690,144.99690,Northern Metropolitan,4019.0
31,Abbotsford,166 Gipps St,3,h,1290000.0,S,Biggin,25/02/2017,2.5,3067.0,...,2.0,2.0,147.0,18.0,,Yarra,-37.80500,144.99430,Northern Metropolitan,4019.0
32,Abbotsford,60 Stafford St,3,h,1290000.0,S,Biggin,25/02/2017,2.5,3067.0,...,1.0,1.0,168.0,124.0,1950.0,Yarra,-37.80070,144.99580,Northern Metropolitan,4019.0
1016,Braybrook,80 South Rd,4,h,645000.0,SP,Biggin,7/11/2016,10.8,3019.0,...,2.0,1.0,283.0,154.0,1990.0,Maribyrnong,-37.79080,144.84850,Western Metropolitan,3589.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13480,Maidstone,3 Wallace St,3,h,735000.0,S,Biggin,26/08/2017,6.4,3012.0,...,1.0,1.0,167.0,110.0,2010.0,,-37.78817,144.87873,Western Metropolitan,3873.0
13497,Moonee Ponds,53 Moore St,4,h,1305000.0,SP,Biggin,26/08/2017,6.2,3039.0,...,1.0,1.0,449.0,129.0,1910.0,,-37.77081,144.92074,Western Metropolitan,6232.0
13505,Mulgrave,2229 Dandenong Rd,3,h,940000.0,S,Biggin,26/08/2017,18.8,3170.0,...,2.0,3.0,709.0,140.0,1958.0,,-37.92811,145.14971,South-Eastern Metropolitan,7113.0
13523,Prahran,69 Greville St,4,h,2668000.0,S,Biggin,26/08/2017,4.6,3181.0,...,2.0,2.0,383.0,,,,-37.84879,144.98882,Southern Metropolitan,7717.0


In [24]:
#next we are going to use or "|" pipe
#we will take data with or based
house_data.loc[(house_data.SellerG == 'Biggin') | (house_data.Rooms >= 3)]

Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
0,Abbotsford,85 Turner St,2,h,1480000.0,S,Biggin,3/12/2016,2.5,3067.0,...,1.0,1.0,202.0,,,Yarra,-37.79960,144.99840,Northern Metropolitan,4019.0
1,Abbotsford,25 Bloomburg St,2,h,1035000.0,S,Biggin,4/02/2016,2.5,3067.0,...,1.0,0.0,156.0,79.0,1900.0,Yarra,-37.80790,144.99340,Northern Metropolitan,4019.0
2,Abbotsford,5 Charles St,3,h,1465000.0,SP,Biggin,4/03/2017,2.5,3067.0,...,2.0,0.0,134.0,150.0,1900.0,Yarra,-37.80930,144.99440,Northern Metropolitan,4019.0
3,Abbotsford,40 Federation La,3,h,850000.0,PI,Biggin,4/03/2017,2.5,3067.0,...,2.0,1.0,94.0,,,Yarra,-37.79690,144.99690,Northern Metropolitan,4019.0
4,Abbotsford,55a Park St,4,h,1600000.0,VB,Nelson,4/06/2016,2.5,3067.0,...,1.0,2.0,120.0,142.0,2014.0,Yarra,-37.80720,144.99410,Northern Metropolitan,4019.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13575,Wheelers Hill,12 Strada Cr,4,h,1245000.0,S,Barry,26/08/2017,16.7,3150.0,...,2.0,2.0,652.0,,1981.0,,-37.90562,145.16761,South-Eastern Metropolitan,7392.0
13576,Williamstown,77 Merrett Dr,3,h,1031000.0,SP,Williams,26/08/2017,6.8,3016.0,...,2.0,2.0,333.0,133.0,1995.0,,-37.85927,144.87904,Western Metropolitan,6380.0
13577,Williamstown,83 Power St,3,h,1170000.0,S,Raine,26/08/2017,6.8,3016.0,...,2.0,4.0,436.0,,1997.0,,-37.85274,144.88738,Western Metropolitan,6380.0
13578,Williamstown,96 Verdon St,4,h,2500000.0,PI,Sweeney,26/08/2017,6.8,3016.0,...,1.0,5.0,866.0,157.0,1920.0,,-37.85908,144.89299,Western Metropolitan,6380.0


In [25]:
#isin - it will check the data is in the list
house_data.loc[house_data.SellerG.isin(['Biggin', 'Williams'])]

Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
0,Abbotsford,85 Turner St,2,h,1480000.0,S,Biggin,3/12/2016,2.5,3067.0,...,1.0,1.0,202.0,,,Yarra,-37.79960,144.99840,Northern Metropolitan,4019.0
1,Abbotsford,25 Bloomburg St,2,h,1035000.0,S,Biggin,4/02/2016,2.5,3067.0,...,1.0,0.0,156.0,79.0,1900.0,Yarra,-37.80790,144.99340,Northern Metropolitan,4019.0
2,Abbotsford,5 Charles St,3,h,1465000.0,SP,Biggin,4/03/2017,2.5,3067.0,...,2.0,0.0,134.0,150.0,1900.0,Yarra,-37.80930,144.99440,Northern Metropolitan,4019.0
3,Abbotsford,40 Federation La,3,h,850000.0,PI,Biggin,4/03/2017,2.5,3067.0,...,2.0,1.0,94.0,,,Yarra,-37.79690,144.99690,Northern Metropolitan,4019.0
8,Abbotsford,6/241 Nicholson St,1,u,300000.0,S,Biggin,8/10/2016,2.5,3067.0,...,1.0,1.0,0.0,,,Yarra,-37.80080,144.99730,Northern Metropolitan,4019.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13497,Moonee Ponds,53 Moore St,4,h,1305000.0,SP,Biggin,26/08/2017,6.2,3039.0,...,1.0,1.0,449.0,129.0,1910.0,,-37.77081,144.92074,Western Metropolitan,6232.0
13505,Mulgrave,2229 Dandenong Rd,3,h,940000.0,S,Biggin,26/08/2017,18.8,3170.0,...,2.0,3.0,709.0,140.0,1958.0,,-37.92811,145.14971,South-Eastern Metropolitan,7113.0
13523,Prahran,69 Greville St,4,h,2668000.0,S,Biggin,26/08/2017,4.6,3181.0,...,2.0,2.0,383.0,,,,-37.84879,144.98882,Southern Metropolitan,7717.0
13531,Richmond,170 Coppin St,4,h,1800000.0,S,Biggin,26/08/2017,2.4,3121.0,...,1.0,4.0,418.0,,,,-37.82368,145.00271,Northern Metropolitan,14949.0


In [26]:
#notnull() - this will return not NaN
house_data.loc[house_data.Price.notnull()]

Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
0,Abbotsford,85 Turner St,2,h,1480000.0,S,Biggin,3/12/2016,2.5,3067.0,...,1.0,1.0,202.0,,,Yarra,-37.79960,144.99840,Northern Metropolitan,4019.0
1,Abbotsford,25 Bloomburg St,2,h,1035000.0,S,Biggin,4/02/2016,2.5,3067.0,...,1.0,0.0,156.0,79.0,1900.0,Yarra,-37.80790,144.99340,Northern Metropolitan,4019.0
2,Abbotsford,5 Charles St,3,h,1465000.0,SP,Biggin,4/03/2017,2.5,3067.0,...,2.0,0.0,134.0,150.0,1900.0,Yarra,-37.80930,144.99440,Northern Metropolitan,4019.0
3,Abbotsford,40 Federation La,3,h,850000.0,PI,Biggin,4/03/2017,2.5,3067.0,...,2.0,1.0,94.0,,,Yarra,-37.79690,144.99690,Northern Metropolitan,4019.0
4,Abbotsford,55a Park St,4,h,1600000.0,VB,Nelson,4/06/2016,2.5,3067.0,...,1.0,2.0,120.0,142.0,2014.0,Yarra,-37.80720,144.99410,Northern Metropolitan,4019.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13575,Wheelers Hill,12 Strada Cr,4,h,1245000.0,S,Barry,26/08/2017,16.7,3150.0,...,2.0,2.0,652.0,,1981.0,,-37.90562,145.16761,South-Eastern Metropolitan,7392.0
13576,Williamstown,77 Merrett Dr,3,h,1031000.0,SP,Williams,26/08/2017,6.8,3016.0,...,2.0,2.0,333.0,133.0,1995.0,,-37.85927,144.87904,Western Metropolitan,6380.0
13577,Williamstown,83 Power St,3,h,1170000.0,S,Raine,26/08/2017,6.8,3016.0,...,2.0,4.0,436.0,,1997.0,,-37.85274,144.88738,Western Metropolitan,6380.0
13578,Williamstown,96 Verdon St,4,h,2500000.0,PI,Sweeney,26/08/2017,6.8,3016.0,...,1.0,5.0,866.0,157.0,1920.0,,-37.85908,144.89299,Western Metropolitan,6380.0


In [None]:
#assigning data into a column
#house_data['SellerG'] = 'Arun'
#house_data['SellerG']

##Summary Functions and Maps

In [27]:
#summary functions
#describe() - this method generates a high-level summary of the attributes of the given column
house_data.Price.describe()

count    1.358000e+04
mean     1.075684e+06
std      6.393107e+05
min      8.500000e+04
25%      6.500000e+05
50%      9.030000e+05
75%      1.330000e+06
max      9.000000e+06
Name: Price, dtype: float64

In [28]:
#to see the mean of the column we can use mean()
house_data.Price.mean()

1075684.079455081

In [29]:
#to see a list of unique values for that unique()
house_data.SellerG.unique()

array(['Biggin', 'Nelson', 'Jellis', 'Greg', 'LITTLE', 'Collins', 'Kay',
       'Beller', 'Marshall', 'Brad', 'Maddison', 'Barry', 'Considine',
       'Rendina', 'Propertyau', 'McDonald', 'Prof.', 'Harcourts',
       'hockingstuart', 'Thomson', 'Buxton', 'RT', "Sotheby's", 'Cayzer',
       'Chisholm', 'Brace', 'Miles', 'McGrath', 'Love', 'Barlow',
       'Sweeney', 'Village', 'Jas', 'Gunn&Co', 'Burnham', 'Williams',
       'Compton', 'FN', 'Raine&Horne', 'Hunter', 'Noel', 'Hodges', 'Ray',
       'Gary', 'Fletchers', 'Woodards', 'Raine', 'Walshe', 'Alexkarbon',
       'Weda', 'Frank', 'Stockdale', 'Tim', 'Purplebricks', 'Moonee',
       'HAR', 'Edward', 'Philip', 'RW', 'North', 'Ascend', 'Christopher',
       'Mandy', 'R&H', 'Fletchers/One', 'Assisi', 'One', "O'Brien", 'C21',
       'Bayside', 'Paul', 'First', 'Matthew', 'Anderson', 'Nick',
       'Lindellas', 'Allens', 'Bells', 'Trimson', 'Douglas', 'YPA', 'GL',
       "Tiernan's", 'J', 'Harrington', 'Dingle', 'Chambers', 'Peter',
    

In [30]:
#to count the unique values value_counts()
house_data.SellerG.value_counts()

Nelson           1565
Jellis           1316
hockingstuart    1167
Barry            1011
Ray               701
                 ... 
Prowse              1
Luxe                1
Zahn                1
Homes               1
Point               1
Name: SellerG, Length: 268, dtype: int64

In [31]:
#map() - method takes set values and "maps" them to set of another values
house_data_price_mean = house_data.Price.mean()
house_data.Price.map(lambda p: p-house_data_price_mean)

0        4.043159e+05
1       -4.068408e+04
2        3.893159e+05
3       -2.256841e+05
4        5.243159e+05
             ...     
13575    1.693159e+05
13576   -4.468408e+04
13577    9.431592e+04
13578    1.424316e+06
13579    2.093159e+05
Name: Price, Length: 13580, dtype: float64

- The function you pass to map() should expect a single value from the Series (a point value, in the above example), and return a transformed version of that value. map() returns a new Series where all the values have been transformed by your function.

In [32]:
#so we use apply() is also the equivalent method to transform data
def remean_price(row):
  row.Price = row.Price - house_data_price_mean
  return row

house_data.apply(remean_price, axis="columns")

Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
0,Abbotsford,85 Turner St,2,h,4.043159e+05,S,Biggin,3/12/2016,2.5,3067.0,...,1.0,1.0,202.0,,,Yarra,-37.79960,144.99840,Northern Metropolitan,4019.0
1,Abbotsford,25 Bloomburg St,2,h,-4.068408e+04,S,Biggin,4/02/2016,2.5,3067.0,...,1.0,0.0,156.0,79.0,1900.0,Yarra,-37.80790,144.99340,Northern Metropolitan,4019.0
2,Abbotsford,5 Charles St,3,h,3.893159e+05,SP,Biggin,4/03/2017,2.5,3067.0,...,2.0,0.0,134.0,150.0,1900.0,Yarra,-37.80930,144.99440,Northern Metropolitan,4019.0
3,Abbotsford,40 Federation La,3,h,-2.256841e+05,PI,Biggin,4/03/2017,2.5,3067.0,...,2.0,1.0,94.0,,,Yarra,-37.79690,144.99690,Northern Metropolitan,4019.0
4,Abbotsford,55a Park St,4,h,5.243159e+05,VB,Nelson,4/06/2016,2.5,3067.0,...,1.0,2.0,120.0,142.0,2014.0,Yarra,-37.80720,144.99410,Northern Metropolitan,4019.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13575,Wheelers Hill,12 Strada Cr,4,h,1.693159e+05,S,Barry,26/08/2017,16.7,3150.0,...,2.0,2.0,652.0,,1981.0,,-37.90562,145.16761,South-Eastern Metropolitan,7392.0
13576,Williamstown,77 Merrett Dr,3,h,-4.468408e+04,SP,Williams,26/08/2017,6.8,3016.0,...,2.0,2.0,333.0,133.0,1995.0,,-37.85927,144.87904,Western Metropolitan,6380.0
13577,Williamstown,83 Power St,3,h,9.431592e+04,S,Raine,26/08/2017,6.8,3016.0,...,2.0,4.0,436.0,,1997.0,,-37.85274,144.88738,Western Metropolitan,6380.0
13578,Williamstown,96 Verdon St,4,h,1.424316e+06,PI,Sweeney,26/08/2017,6.8,3016.0,...,1.0,5.0,866.0,157.0,1920.0,,-37.85908,144.89299,Western Metropolitan,6380.0


- note that map() and apply() return new data they dont alter existing data

In [33]:
#we are going take the data and check it its change
house_data.head(1)

Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
0,Abbotsford,85 Turner St,2,h,1480000.0,S,Biggin,3/12/2016,2.5,3067.0,...,1.0,1.0,202.0,,,Yarra,-37.7996,144.9984,Northern Metropolitan,4019.0


##Grouping and Sorting
- maps allows us to transform data in a dataframe one value at a time for an entire column
- we use groupby() for group our data and then do something specific to the group

In [34]:
#group wise analysis using group_by() function
house_data.groupby('Car').Car.count()

Car
0.0     1026
1.0     5509
2.0     5591
3.0      748
4.0      506
5.0       63
6.0       54
7.0        8
8.0        9
9.0        1
10.0       3
Name: Car, dtype: int64

In [35]:
#so we are going to grouping the car and getting data with low price
house_data.groupby('Car').Price.min()

Car
0.0       85000.0
1.0      145000.0
2.0      131000.0
3.0      370000.0
4.0      295000.0
5.0      531000.0
6.0      451000.0
7.0      560000.0
8.0      580000.0
9.0     2100000.0
10.0     880000.0
Name: Price, dtype: float64

In [37]:
#applying groupby using lambda
house_data.groupby('Price').apply(lambda df: df.Suburb.iloc[0])

Price
85000.0        Footscray
131000.0       Caulfield
145000.0          Coburg
160000.0        Hawthorn
170000.0       Footscray
                ...     
6400000.0    Middle Park
6500000.0            Kew
7650000.0       Hawthorn
8000000.0     Canterbury
9000000.0       Mulgrave
Length: 2204, dtype: object

In [41]:
#applying groupby with two column using lambda
house_data.groupby(['Car', 'Price']).apply(lambda df: df.loc[df.Rooms.idxmax()])

Unnamed: 0_level_0,Unnamed: 1_level_0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
Car,Price,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
0.0,85000.0,Footscray,202/51 Gordon St,1,u,85000.0,PI,Burnham,3/09/2016,6.4,3011.0,...,1.0,0.0,0.0,,2007.0,Maribyrnong,-37.79110,144.89000,Western Metropolitan,7570.0
0.0,160000.0,Hawthorn,17/17 Park St,1,u,160000.0,VB,HAR,8/04/2017,4.6,3122.0,...,1.0,0.0,322.0,,2009.0,Boroondara,-37.81980,145.03730,Southern Metropolitan,11308.0
0.0,170000.0,Footscray,10/30 Pickett St,1,u,170000.0,PI,Burnham,1/07/2017,5.1,3011.0,...,1.0,0.0,30.0,26.0,2013.0,Maribyrnong,-37.80141,144.89587,Western Metropolitan,7570.0
0.0,210000.0,Melbourne,928/43 Therry St,1,u,210000.0,VB,Greg,13/08/2016,2.8,3000.0,...,1.0,0.0,0.0,,,Melbourne,-37.80780,144.96100,Northern Metropolitan,17496.0
0.0,222000.0,Moonee Ponds,7/110 Maribyrnong Rd,1,u,222000.0,S,Barry,17/06/2017,6.2,3039.0,...,1.0,0.0,0.0,29.0,2012.0,Moonee Valley,-37.77154,144.91597,Western Metropolitan,6232.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8.0,3850000.0,Northcote,215 Clarke St,4,h,3850000.0,PI,Jellis,30/07/2016,5.5,3070.0,...,4.0,8.0,1390.0,,1912.0,Darebin,-37.77670,144.99960,Northern Metropolitan,11364.0
9.0,2100000.0,Surrey Hills,1093 Riversdale Rd,3,h,2100000.0,VB,Jellis,1/07/2017,10.2,3127.0,...,1.0,9.0,841.0,124.0,1960.0,Whitehorse,-37.83729,145.10929,Southern Metropolitan,5457.0
10.0,880000.0,Dandenong,1462 Heatherton Rd,3,h,880000.0,S,Barry,22/07/2017,24.7,3175.0,...,2.0,10.0,734.0,,,Greater Dandenong,-37.96969,145.21043,South-Eastern Metropolitan,10894.0
10.0,925000.0,Bayswater,95 Orange Gr,4,h,925000.0,SP,Biggin,17/06/2017,23.2,3153.0,...,1.0,10.0,993.0,128.0,1966.0,Knox,-37.84688,145.25632,Eastern Metropolitan,5030.0


In [42]:
#using agg() function and running a bunch of different functions
house_data.groupby(['Suburb']).Price.agg([len, min, max])

Unnamed: 0_level_0,len,min,max
Suburb,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Abbotsford,56,300000.0,1876000.0
Aberfeldie,44,280000.0,3900000.0
Airport West,67,440000.0,1250000.0
Albanvale,6,415000.0,655000.0
Albert Park,69,442500.0,4735000.0
...,...,...,...
Wonga Park,1,900000.0,900000.0
Wyndham Vale,4,475000.0,500000.0
Yallambie,24,602500.0,1670000.0
Yarra Glen,1,620000.0,620000.0


In [44]:
#multi-indexes agg()
house_data_reviewed = house_data.groupby(['SellerG', 'Car']).Suburb.agg([len])
house_data_reviewed

Unnamed: 0_level_0,Unnamed: 1_level_0,len
SellerG,Car,Unnamed: 2_level_1
@Realty,0.0,1
@Realty,3.0,1
ASL,1.0,2
ASL,2.0,2
Abercromby's,0.0,1
...,...,...
iTRAK,1.0,2
iTRAK,2.0,5
iTRAK,3.0,2
iTRAK,4.0,2


In [45]:
mi = house_data_reviewed.index
type(mi)

pandas.core.indexes.multi.MultiIndex

In [46]:
#reset_index() to converting back to regular index
house_data_reviewed.reset_index()

Unnamed: 0,SellerG,Car,len
0,@Realty,0.0,1
1,@Realty,3.0,1
2,ASL,1.0,2
3,ASL,2.0,2
4,Abercromby's,0.0,1
...,...,...,...
738,iTRAK,1.0,2
739,iTRAK,2.0,5
740,iTRAK,3.0,2
741,iTRAK,4.0,2


In [47]:
#sorting - to get data in the order want it in we can sort it ourselves
#sort_values()
house_data_reviewed = house_data_reviewed.reset_index()
house_data_reviewed.sort_values(by='len')

Unnamed: 0,SellerG,Car,len
0,@Realty,0.0,1
415,Melbourne,6.0,1
414,Melbourne,5.0,1
413,Melbourne,3.0,1
410,Meadows,2.0,1
...,...,...,...
32,Barry,2.0,492
722,hockingstuart,1.0,523
309,Jellis,1.0,529
443,Nelson,2.0,561


In [48]:
#you can also set ascending and descending
house_data_reviewed.sort_values(by='len', ascending=False)

Unnamed: 0,SellerG,Car,len
442,Nelson,1.0,732
443,Nelson,2.0,561
309,Jellis,1.0,529
722,hockingstuart,1.0,523
32,Barry,2.0,492
...,...,...,...
352,Leeburn,4.0,1
350,Leeburn,1.0,1
348,Leased,0.0,1
347,Leading,4.0,1


In [49]:
#to sort by index values we can use sort_index()
house_data_reviewed.sort_index()

Unnamed: 0,SellerG,Car,len
0,@Realty,0.0,1
1,@Realty,3.0,1
2,ASL,1.0,2
3,ASL,2.0,2
4,Abercromby's,0.0,1
...,...,...,...
738,iTRAK,1.0,2
739,iTRAK,2.0,5
740,iTRAK,3.0,2
741,iTRAK,4.0,2


In [50]:
#and also we can sort by more than one column at a time
house_data_reviewed.sort_values(by=['Car', 'len'])

Unnamed: 0,SellerG,Car,len
0,@Realty,0.0,1
4,Abercromby's,0.0,1
19,Ascend,0.0,1
26,Barlow,0.0,1
68,Brace,0.0,1
...,...,...,...
316,Jellis,8.0,3
317,Jellis,9.0,1
39,Barry,10.0,1
64,Biggin,10.0,1


##Data types and missing values

In [51]:
#to find data type use dtype
house_data.Price.dtype

dtype('float64')

In [52]:
#and also we can find datatypes of every column
house_data.dtypes

Suburb            object
Address           object
Rooms              int64
Type              object
Price            float64
Method            object
SellerG           object
Date              object
Distance         float64
Postcode         float64
Bedroom2         float64
Bathroom         float64
Car              float64
Landsize         float64
BuildingArea     float64
YearBuilt        float64
CouncilArea       object
Lattitude        float64
Longtitude       float64
Regionname        object
Propertycount    float64
dtype: object

In [53]:
#data type conversion also possible by using 'astype()'
house_data.Rooms.astype('float64')

0        2.0
1        2.0
2        3.0
3        3.0
4        4.0
        ... 
13575    4.0
13576    3.0
13577    3.0
13578    4.0
13579    4.0
Name: Rooms, Length: 13580, dtype: float64

In [54]:
#and index has its own dtype
house_data.index.dtype

dtype('int64')

In [60]:
#we can find missing values NaN("Not a Number")
#to find NaN entries you can use .isnull() function
#and also we can find not NaN values by using .notnull()
house_data[pd.isnull(house_data.CouncilArea)]

Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
7584,Brighton East,7 Roberts Ct,3,h,1270000.0,VB,Ray,20/05/2017,10.7,3187.0,...,1.0,3.0,724.0,,,,-37.92910,145.02970,Southern Metropolitan,6938.0
10797,Reservoir,48 Crevelli St,3,h,526250.0,SP,Barry,8/07/2017,12.0,3073.0,...,1.0,1.0,308.0,,,,-37.72828,145.03033,Northern Metropolitan,21650.0
12213,Aberfeldie,1 Alma St,4,h,1436000.0,S,Brad,3/09/2017,7.5,3040.0,...,3.0,3.0,511.0,187.0,1922.0,,-37.75788,144.90487,Western Metropolitan,1543.0
12214,Albion,40 Ridley St,5,h,905000.0,S,hockingstuart,3/09/2017,10.5,3020.0,...,2.0,3.0,732.0,,1925.0,,-37.78345,144.82295,Western Metropolitan,2185.0
12215,Alphington,22 Harker St,4,h,1680000.0,S,Love,3/09/2017,5.7,3078.0,...,3.0,2.0,720.0,,,,-37.77928,145.02993,Northern Metropolitan,2211.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13575,Wheelers Hill,12 Strada Cr,4,h,1245000.0,S,Barry,26/08/2017,16.7,3150.0,...,2.0,2.0,652.0,,1981.0,,-37.90562,145.16761,South-Eastern Metropolitan,7392.0
13576,Williamstown,77 Merrett Dr,3,h,1031000.0,SP,Williams,26/08/2017,6.8,3016.0,...,2.0,2.0,333.0,133.0,1995.0,,-37.85927,144.87904,Western Metropolitan,6380.0
13577,Williamstown,83 Power St,3,h,1170000.0,S,Raine,26/08/2017,6.8,3016.0,...,2.0,4.0,436.0,,1997.0,,-37.85274,144.88738,Western Metropolitan,6380.0
13578,Williamstown,96 Verdon St,4,h,2500000.0,PI,Sweeney,26/08/2017,6.8,3016.0,...,1.0,5.0,866.0,157.0,1920.0,,-37.85908,144.89299,Western Metropolitan,6380.0


In [58]:
#and also youre able to replace the missing values
house_data.CouncilArea.fillna('Unknown')

0          Yarra
1          Yarra
2          Yarra
3          Yarra
4          Yarra
          ...   
13575    Unknown
13576    Unknown
13577    Unknown
13578    Unknown
13579    Unknown
Name: CouncilArea, Length: 13580, dtype: object

In [61]:
#and also you can replace() values by using replace method
house_data.SellerG.replace('Barry', 'Arunisto')

0          Biggin
1          Biggin
2          Biggin
3          Biggin
4          Nelson
           ...   
13575    Arunisto
13576    Williams
13577       Raine
13578     Sweeney
13579     Village
Name: SellerG, Length: 13580, dtype: object

##Renaming and Combining

In [63]:
#you can rename columns by using rename() method
house_data.rename(columns={'SellerG':'Owner'})

Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,Owner,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
0,Abbotsford,85 Turner St,2,h,1480000.0,S,Biggin,3/12/2016,2.5,3067.0,...,1.0,1.0,202.0,,,Yarra,-37.79960,144.99840,Northern Metropolitan,4019.0
1,Abbotsford,25 Bloomburg St,2,h,1035000.0,S,Biggin,4/02/2016,2.5,3067.0,...,1.0,0.0,156.0,79.0,1900.0,Yarra,-37.80790,144.99340,Northern Metropolitan,4019.0
2,Abbotsford,5 Charles St,3,h,1465000.0,SP,Biggin,4/03/2017,2.5,3067.0,...,2.0,0.0,134.0,150.0,1900.0,Yarra,-37.80930,144.99440,Northern Metropolitan,4019.0
3,Abbotsford,40 Federation La,3,h,850000.0,PI,Biggin,4/03/2017,2.5,3067.0,...,2.0,1.0,94.0,,,Yarra,-37.79690,144.99690,Northern Metropolitan,4019.0
4,Abbotsford,55a Park St,4,h,1600000.0,VB,Nelson,4/06/2016,2.5,3067.0,...,1.0,2.0,120.0,142.0,2014.0,Yarra,-37.80720,144.99410,Northern Metropolitan,4019.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13575,Wheelers Hill,12 Strada Cr,4,h,1245000.0,S,Barry,26/08/2017,16.7,3150.0,...,2.0,2.0,652.0,,1981.0,,-37.90562,145.16761,South-Eastern Metropolitan,7392.0
13576,Williamstown,77 Merrett Dr,3,h,1031000.0,SP,Williams,26/08/2017,6.8,3016.0,...,2.0,2.0,333.0,133.0,1995.0,,-37.85927,144.87904,Western Metropolitan,6380.0
13577,Williamstown,83 Power St,3,h,1170000.0,S,Raine,26/08/2017,6.8,3016.0,...,2.0,4.0,436.0,,1997.0,,-37.85274,144.88738,Western Metropolitan,6380.0
13578,Williamstown,96 Verdon St,4,h,2500000.0,PI,Sweeney,26/08/2017,6.8,3016.0,...,1.0,5.0,866.0,157.0,1920.0,,-37.85908,144.89299,Western Metropolitan,6380.0


In [64]:
#rename() and you can rename index or column values
house_data.rename(index={0:'FirstEntry', 1:'SecondEntry'})

Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
FirstEntry,Abbotsford,85 Turner St,2,h,1480000.0,S,Biggin,3/12/2016,2.5,3067.0,...,1.0,1.0,202.0,,,Yarra,-37.79960,144.99840,Northern Metropolitan,4019.0
SecondEntry,Abbotsford,25 Bloomburg St,2,h,1035000.0,S,Biggin,4/02/2016,2.5,3067.0,...,1.0,0.0,156.0,79.0,1900.0,Yarra,-37.80790,144.99340,Northern Metropolitan,4019.0
2,Abbotsford,5 Charles St,3,h,1465000.0,SP,Biggin,4/03/2017,2.5,3067.0,...,2.0,0.0,134.0,150.0,1900.0,Yarra,-37.80930,144.99440,Northern Metropolitan,4019.0
3,Abbotsford,40 Federation La,3,h,850000.0,PI,Biggin,4/03/2017,2.5,3067.0,...,2.0,1.0,94.0,,,Yarra,-37.79690,144.99690,Northern Metropolitan,4019.0
4,Abbotsford,55a Park St,4,h,1600000.0,VB,Nelson,4/06/2016,2.5,3067.0,...,1.0,2.0,120.0,142.0,2014.0,Yarra,-37.80720,144.99410,Northern Metropolitan,4019.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13575,Wheelers Hill,12 Strada Cr,4,h,1245000.0,S,Barry,26/08/2017,16.7,3150.0,...,2.0,2.0,652.0,,1981.0,,-37.90562,145.16761,South-Eastern Metropolitan,7392.0
13576,Williamstown,77 Merrett Dr,3,h,1031000.0,SP,Williams,26/08/2017,6.8,3016.0,...,2.0,2.0,333.0,133.0,1995.0,,-37.85927,144.87904,Western Metropolitan,6380.0
13577,Williamstown,83 Power St,3,h,1170000.0,S,Raine,26/08/2017,6.8,3016.0,...,2.0,4.0,436.0,,1997.0,,-37.85274,144.88738,Western Metropolitan,6380.0
13578,Williamstown,96 Verdon St,4,h,2500000.0,PI,Sweeney,26/08/2017,6.8,3016.0,...,1.0,5.0,866.0,157.0,1920.0,,-37.85908,144.89299,Western Metropolitan,6380.0


In [65]:
#and also we can rename the axis by using rename_axis()
house_data.rename_axis("houses", axis="rows").rename_axis("fields", axis="columns")

fields,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
houses,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,Abbotsford,85 Turner St,2,h,1480000.0,S,Biggin,3/12/2016,2.5,3067.0,...,1.0,1.0,202.0,,,Yarra,-37.79960,144.99840,Northern Metropolitan,4019.0
1,Abbotsford,25 Bloomburg St,2,h,1035000.0,S,Biggin,4/02/2016,2.5,3067.0,...,1.0,0.0,156.0,79.0,1900.0,Yarra,-37.80790,144.99340,Northern Metropolitan,4019.0
2,Abbotsford,5 Charles St,3,h,1465000.0,SP,Biggin,4/03/2017,2.5,3067.0,...,2.0,0.0,134.0,150.0,1900.0,Yarra,-37.80930,144.99440,Northern Metropolitan,4019.0
3,Abbotsford,40 Federation La,3,h,850000.0,PI,Biggin,4/03/2017,2.5,3067.0,...,2.0,1.0,94.0,,,Yarra,-37.79690,144.99690,Northern Metropolitan,4019.0
4,Abbotsford,55a Park St,4,h,1600000.0,VB,Nelson,4/06/2016,2.5,3067.0,...,1.0,2.0,120.0,142.0,2014.0,Yarra,-37.80720,144.99410,Northern Metropolitan,4019.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
13575,Wheelers Hill,12 Strada Cr,4,h,1245000.0,S,Barry,26/08/2017,16.7,3150.0,...,2.0,2.0,652.0,,1981.0,,-37.90562,145.16761,South-Eastern Metropolitan,7392.0
13576,Williamstown,77 Merrett Dr,3,h,1031000.0,SP,Williams,26/08/2017,6.8,3016.0,...,2.0,2.0,333.0,133.0,1995.0,,-37.85927,144.87904,Western Metropolitan,6380.0
13577,Williamstown,83 Power St,3,h,1170000.0,S,Raine,26/08/2017,6.8,3016.0,...,2.0,4.0,436.0,,1997.0,,-37.85274,144.88738,Western Metropolitan,6380.0
13578,Williamstown,96 Verdon St,4,h,2500000.0,PI,Sweeney,26/08/2017,6.8,3016.0,...,1.0,5.0,866.0,157.0,1920.0,,-37.85908,144.89299,Western Metropolitan,6380.0


In [66]:
#and also we can combine two datasets
#we use concat(), join(), and merge()
us_youtube = pd.read_csv('/content/drive/MyDrive/Datascience&MachineLearning/datasets/USvideos.csv')
in_youtube = pd.read_csv('/content/drive/MyDrive/Datascience&MachineLearning/datasets/INvideos.csv')
pd.concat([us_youtube, in_youtube])

Unnamed: 0,video_id,trending_date,title,channel_title,category_id,publish_time,tags,views,likes,dislikes,comment_count,thumbnail_link,comments_disabled,ratings_disabled,video_error_or_removed,description
0,2kyS6SvSYSE,17.14.11,WE WANT TO TALK ABOUT OUR MARRIAGE,CaseyNeistat,22,2017-11-13T17:13:01.000Z,SHANtell martin,748374,57527,2966,15954,https://i.ytimg.com/vi/2kyS6SvSYSE/default.jpg,False,False,False,SHANTELL'S CHANNEL - https://www.youtube.com/s...
1,1ZAPwfrtAFY,17.14.11,The Trump Presidency: Last Week Tonight with J...,LastWeekTonight,24,2017-11-13T07:30:00.000Z,"last week tonight trump presidency|""last week ...",2418783,97185,6146,12703,https://i.ytimg.com/vi/1ZAPwfrtAFY/default.jpg,False,False,False,"One year after the presidential election, John..."
2,5qpjK5DgCt4,17.14.11,"Racist Superman | Rudy Mancuso, King Bach & Le...",Rudy Mancuso,23,2017-11-12T19:05:24.000Z,"racist superman|""rudy""|""mancuso""|""king""|""bach""...",3191434,146033,5339,8181,https://i.ytimg.com/vi/5qpjK5DgCt4/default.jpg,False,False,False,WATCH MY PREVIOUS VIDEO ▶ \n\nSUBSCRIBE ► http...
3,puqaWrEC7tY,17.14.11,Nickelback Lyrics: Real or Fake?,Good Mythical Morning,24,2017-11-13T11:00:04.000Z,"rhett and link|""gmm""|""good mythical morning""|""...",343168,10172,666,2146,https://i.ytimg.com/vi/puqaWrEC7tY/default.jpg,False,False,False,Today we find out if Link is a Nickelback amat...
4,d380meD0W0M,17.14.11,I Dare You: GOING BALD!?,nigahiga,24,2017-11-12T18:01:41.000Z,"ryan|""higa""|""higatv""|""nigahiga""|""i dare you""|""...",2095731,132235,1989,17518,https://i.ytimg.com/vi/d380meD0W0M/default.jpg,False,False,False,I know it's been a while since we did this sho...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
37347,iNHecA3PJCo,18.14.06,फेकू आशिक़ - राजस्थान की सबसे शानदार कॉमेडी | ...,RDC Rajasthani,23,2018-06-13T08:01:11.000Z,"twinkle vaishnav comedy|""twinkle vaishnav""|""tw...",214378,3291,404,196,https://i.ytimg.com/vi/iNHecA3PJCo/default.jpg,False,False,False,PRG Music & RDC Rajasthani presents फेकू आशिक़...
37348,dpPmPbhcslM,18.14.06,Seetha | Flowers | Ep# 364,Flowers TV,24,2018-06-13T11:30:04.000Z,"flowers serials|""actress""|""malayalam serials""|...",406828,1726,478,1428,https://i.ytimg.com/vi/dpPmPbhcslM/default.jpg,False,False,False,"Flowers - A R Rahman Show,Book your Tickets He..."
37349,mV6aztP58f8,18.14.06,Bhramanam I Episode 87 - 12 June 2018 I Mazhav...,Mazhavil Manorama,24,2018-06-13T05:00:02.000Z,"mazhavil manorama|""bhramanam full episode""|""gt...",386319,1216,453,697,https://i.ytimg.com/vi/mV6aztP58f8/default.jpg,False,False,False,Subscribe to Mazhavil Manorama now for your da...
37350,qxqDNP1bDEw,18.14.06,Nua Bohu | Full Ep 285 | 13th June 2018 | Odia...,Tarang TV,24,2018-06-13T15:07:49.000Z,"tarang|""tarang tv""|""tarang tv online""|""tarang ...",130263,698,115,65,https://i.ytimg.com/vi/qxqDNP1bDEw/default.jpg,False,False,False,Nuabohu : Story of a rustic village girl who w...


In [67]:
#join() you can combine different data frame objects
#for eg, we are going to pull down videos that happened trending on the same day
#in both India and the US
left = us_youtube.set_index(['title', 'trending_date'])
right = in_youtube.set_index(['title', 'trending_date'])
left.join(right, lsuffix='_US', rsuffix='_IN')

Unnamed: 0_level_0,Unnamed: 1_level_0,video_id_US,channel_title_US,category_id_US,publish_time_US,tags_US,views_US,likes_US,dislikes_US,comment_count_US,thumbnail_link_US,...,tags_IN,views_IN,likes_IN,dislikes_IN,comment_count_IN,thumbnail_link_IN,comments_disabled_IN,ratings_disabled_IN,video_error_or_removed_IN,description_IN
title,trending_date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
#184 Making a PCB using EasyEDA. // Review,17.07.12,BPmgDhwbd1w,MickMake,28,2017-12-02T14:05:07.000Z,"MickMake|""electronics""|""embedded""|""maker""|""diy...",3237,161,2,35,https://i.ytimg.com/vi/BPmgDhwbd1w/default.jpg,...,,,,,,,,,,
"#23 Feed The Homeless | One List, One Life",17.01.12,4qakFfGRV4E,"One List , One Life",22,2017-11-30T15:36:12.000Z,"homeless|""experiment""|""people""|""man""|""singing""...",32385,568,77,97,https://i.ytimg.com/vi/4qakFfGRV4E/default.jpg,...,,,,,,,,,,
"#23 Feed The Homeless | One List, One Life",17.02.12,4qakFfGRV4E,"One List , One Life",22,2017-11-30T15:36:12.000Z,"homeless|""experiment""|""people""|""man""|""singing""...",40644,667,85,106,https://i.ytimg.com/vi/4qakFfGRV4E/default.jpg,...,,,,,,,,,,
"#23 Feed The Homeless | One List, One Life",17.03.12,4qakFfGRV4E,"One List , One Life",22,2017-11-30T15:36:12.000Z,"homeless|""experiment""|""people""|""man""|""singing""...",41274,683,85,110,https://i.ytimg.com/vi/4qakFfGRV4E/default.jpg,...,,,,,,,,,,
"#23 Feed The Homeless | One List, One Life",17.04.12,4qakFfGRV4E,"One List , One Life",22,2017-11-30T15:36:12.000Z,"homeless|""experiment""|""people""|""man""|""singing""...",41742,707,86,112,https://i.ytimg.com/vi/4qakFfGRV4E/default.jpg,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
🎃 How to make Pumpkin Pie Mistakes,17.28.11,bAfn2duIlN8,iJustine,22,2017-11-21T19:39:43.000Z,"ijustine|""how to make pumpkin pie""|""pumpkin pi...",186407,8034,301,1211,https://i.ytimg.com/vi/bAfn2duIlN8/default.jpg,...,,,,,,,,,,
🎃 How to make Pumpkin Pie Mistakes,17.29.11,bAfn2duIlN8,iJustine,22,2017-11-21T19:39:43.000Z,"ijustine|""how to make pumpkin pie""|""pumpkin pi...",193223,8141,302,1226,https://i.ytimg.com/vi/bAfn2duIlN8/default.jpg,...,,,,,,,,,,
"😱 $1,145 iPhone Case!!",18.04.02,r3J784MSRyQ,iJustine,28,2018-02-02T23:33:00.000Z,"ijustine|""gray international""|""most expensive ...",408713,15040,2038,2617,https://i.ytimg.com/vi/r3J784MSRyQ/default.jpg,...,,,,,,,,,,
"😱 $1,145 iPhone Case!!",18.05.02,r3J784MSRyQ,iJustine,28,2018-02-02T23:33:00.000Z,"ijustine|""gray international""|""most expensive ...",673040,18904,3852,3639,https://i.ytimg.com/vi/r3J784MSRyQ/default.jpg,...,,,,,,,,,,
