# Section 1: Business Understanding

This is my first project in my Udacity Nanodegree Program on Data Science entitled "Write A Data Science Blog Post".
It is aimed at analyzing a dataset of my choice following three chosen questions as guides.
In this case, I am analyzing Seattle AirBnB data.

AirBnB is a company that is into tourism. It is particularly in the business of getting home accommodations for tourists on 
short time basis. Some tourists prefer this type of accommodation to hotel accommodations for different reasons. The owners of 
these accommodations submit the details of their homes to AirBnB and they get listed with listing id, date of listing and price 
per night. This work looks at the business activities of AirBnB in the city of Seattle USA, partly in 2016 and partly in 2017.
In the other notebook file called boston_airbnb, the business activities of AirBnB in Boston (also in the USA) was also 
analyzed. The essence is to attempt a comparison of both cities using their data and the guiding questions are:

1. Which listing ids were most often used and which were least often used?
2. What is the maximum number of accommodations listed per day and what is the minumum number? Which dates were they?
3. Which price tag was mostly used to list accommodations and which was least used? How often were they used?
4. Is there any correlation among the feature variables?


# Section 2: Data Understanding

In [50]:
import pandas as pd
from pandas import DataFrame, Series
import numpy as np


original_df = pd.read_csv('/Users/izzit/Desktop/All/udacity_tutorial/project1/seattle_data.csv')
original_df.shape

# The original data has 1,393,570 rows and 4 columns (see below)

(1393570, 4)

In [3]:
cols = original_df.columns # Columns in the data
cols
# The columns of the data are 'listing_id', 'date', 'available', 'price' (see below)

Index(['listing_id', 'date', 'available', 'price'], dtype='object')

In [2]:
first_fifty = original_df.head(50) # The first fifty records of the data (see below)
first_fifty

Unnamed: 0,listing_id,date,available,price
0,241032,2016-01-04,t,$85.00
1,241032,2016-01-05,t,$85.00
2,241032,2016-01-06,f,
3,241032,2016-01-07,f,
4,241032,2016-01-08,f,
5,241032,2016-01-09,f,
6,241032,2016-01-10,f,
7,241032,2016-01-11,f,
8,241032,2016-01-12,f,
9,241032,2016-01-13,t,$85.00


In [4]:
last_fifty = original_df.tail(50) # The last fifty records of the data (see below)
last_fifty

Unnamed: 0,listing_id,date,available,price
1393520,10208623,2016-11-14,f,
1393521,10208623,2016-11-15,f,
1393522,10208623,2016-11-16,f,
1393523,10208623,2016-11-17,f,
1393524,10208623,2016-11-18,f,
1393525,10208623,2016-11-19,f,
1393526,10208623,2016-11-20,f,
1393527,10208623,2016-11-21,f,
1393528,10208623,2016-11-22,f,
1393529,10208623,2016-11-23,f,


In [5]:
nan_cols = np.sum(original_df.isnull()) # Columns with missing values
nan_cols
# Only the 'price' column has missing values (see below)

listing_id         0
date               0
available          0
price         459028
dtype: int64

In [6]:
price_nan = np.sum(original_df['price'].isnull()) # Missing values in price column
price_nan
# The number of missing values in the price column is 459,028 (see below)

459028

In [7]:
nan_prop = price_nan/original_df.shape[0] # Proportion of missing values to the total entries
nan_prop
# The proportion of missing values to the total entries is 0.32938998399793334 (33% approx.) (see below)

0.32938998399793334

# Section 3: Data Preparation

In [8]:
df_without_nan = original_df.dropna(subset=['price'], axis=0) # Removing missing values
df_without_nan.shape

# The dataset without missing values, has 934,542 rows and 4 columns (see below)

(934542, 4)

# Section 4: Evaluation

**Question 1. Which listing ids were most often used and which were least often used?**

In [9]:
listing_id_count_df = df_without_nan['listing_id'].value_counts() # Gives the number of times each listing id featured
listing_id_count_df

# From the above lines of code, the listing ids that appeared the highest number of times, featured 365 times each.
# The listing ids that appeared the least number of times, featured only 1 time each.
# Also, the total number of the different listing ids is 3723.

11012       365
2926776     365
1594412     365
4669377     365
4993710     365
           ... 
9868607       1
9714078       1
10210625      1
3819831       1
10235136      1
Name: listing_id, Length: 3723, dtype: int64

In [10]:
listing_id_count_df.shape[0] # Total number of rows = 3723 (see below)

3723

In [11]:
# Highest-featured listing ids
highest_listing_id_count = np.sum(df_without_nan['listing_id'].value_counts() == 365) # Sum of the different listing ids that featured most
highest_listing_id_count
# From the above lines of code, there are 678 different listing ids that featured for the highest number of times (ie, 365 times) each.

678

In [13]:
col_reset_df = (df_without_nan['listing_id'].value_counts() == 365).reset_index() # Resetting the column names
col_reset_df.rename(columns={'index': 'listing_id', 'listing_id': 'available'}, inplace=True) # Reassigning column names (see below)
col_reset_df


Unnamed: 0,listing_id,available
0,11012,True
1,2926776,True
2,1594412,True
3,4669377,True
4,4993710,True
...,...,...
3718,9868607,False
3719,9714078,False
3720,10210625,False
3721,3819831,False


In [14]:
only_listing_id_df = col_reset_df.drop('available', axis=1) # Dropping the 'available' cloumn to have only the listing id column
only_listing_id_df

Unnamed: 0,listing_id
0,11012
1,2926776
2,1594412
3,4669377
4,4993710
...,...
3718,9868607
3719,9714078
3720,10210625
3721,3819831


In [15]:
only_highest_listing_id_df = only_listing_id_df.iloc[0:highest_listing_id_count] # Dataframe of only the listing ids that featured the highest number of times
only_highest_listing_id_df

Unnamed: 0,listing_id
0,11012
1,2926776
2,1594412
3,4669377
4,4993710
...,...
673,5559643
674,5219336
675,103920
676,5927083


In [16]:
# Breaking the 678 entries into lists of 60 each, for easier visualization
listing1 = list(only_highest_listing_id_df['listing_id'].iloc[0:60])
listing2 = list(only_highest_listing_id_df['listing_id'].iloc[60:120])
listing3 = list(only_highest_listing_id_df['listing_id'].iloc[120:180])
listing4 = list(only_highest_listing_id_df['listing_id'].iloc[180:240])
listing5 = list(only_highest_listing_id_df['listing_id'].iloc[240:300])
listing6 = list(only_highest_listing_id_df['listing_id'].iloc[300:360])
listing7 = list(only_highest_listing_id_df['listing_id'].iloc[360:420])
listing8 = list(only_highest_listing_id_df['listing_id'].iloc[420:480])
listing9 = list(only_highest_listing_id_df['listing_id'].iloc[480:540])
listing10 = list(only_highest_listing_id_df['listing_id'].iloc[540:600])
listing11 = list(only_highest_listing_id_df['listing_id'].iloc[600:660])
listing12 = list(only_highest_listing_id_df['listing_id'].iloc[660:678])

# First batch of the listing ids with 60 rows and 6 columns, for esier visualization (see below)
final_listing_id_df1 = DataFrame({'listing_id_0_60' : Series(listing1), 'listing_id_60_120' : Series(listing2), 
                                 'listing_id_120_180' : Series(listing3), 'listing_id_180_240' : Series(listing4), 
                                 'listing_id_240_300' : Series(listing5), 'listing_id_300_360' : Series(listing6)})

final_listing_id_df1

Unnamed: 0,listing_id_0_60,listing_id_60_120,listing_id_120_180,listing_id_180_240,listing_id_240_300,listing_id_300_360
0,11012,52525,9410302,6629132,1737244,7718139
1,2926776,8118346,3803212,7713043,8472954,7235573
2,1594412,4395349,5992032,6575407,7247518,6078382
3,4669377,7914021,1289082,278830,7938153,3888986
4,4993710,1520593,9157232,430453,3251069,8292326
5,9345786,6543683,2187563,8310398,1259305,1432713
6,8350401,496074,4454264,1063855,8156894,613151
7,3726391,6644628,8508223,6249536,7620570,5047188
8,6797786,7646637,607788,8446347,1571230,6728419
9,6545246,3925572,3720731,9531265,8255196,7921453


In [17]:
# Second batch of the listing ids with 60 rows and 6 columns, for esier visualization (see below)
final_listing_id_df2 = DataFrame({'listing_id_360_420' : Series(listing7), 'listing_id_420_480' : Series(listing8), 
                                 'listing_id_480_540' : Series(listing9), 'listing_id_540_600' : Series(listing10), 
                                 'listing_id_600_660' : Series(listing11), 'listing_id_660_678' : Series(listing12)})
final_listing_id_df2

Unnamed: 0,listing_id_360_420,listing_id_420_480,listing_id_480_540,listing_id_540_600,listing_id_600_660,listing_id_660_678
0,4708075,8901863,3291777,180939,149489,1976382.0
1,4811583,3793185,4225225,10385,9920856,8742753.0
2,3282000,3416217,4022127,3768742,1601714,10141859.0
3,9545766,7462439,6421243,9415562,7596455,4718820.0
4,2818420,1633025,4291,11411,1589461,10695.0
5,9095510,1549973,7499506,6561811,6629657,7596934.0
6,5423692,4318814,9292039,2500188,9597947,4566393.0
7,3024336,1831338,6545602,3254956,1488166,215954.0
8,1732441,5950957,9238818,449602,4418480,6606.0
9,6403104,7050204,7746170,6759038,6130287,3303857.0


In [18]:
# Least-featured listing ids
least_listing_id_count = np.sum(df_without_nan['listing_id'].value_counts() == 1) # Sum of the different listing ids that featured least
least_listing_id_count
# From the above line of code, there are 7 different listing ids that featured for the least number of times (ie, 1 time)

7

In [53]:
least_listing_df = listing_id_count_df.tail(least_listing_id_count) # Dataframe of the 7 least-featured listing ids (see below)
least_listing_df

10319529    1
656909      1
9868607     1
9714078     1
10210625    1
3819831     1
10235136    1
Name: listing_id, dtype: int64

In [21]:
# Listing id (contd.)
# To filter out any listing id and see how many times it featured, please use the function below

def listings(first_row, sec_row):
    
    '''
    This function gives a listing id or a set of listing ids with the number of times they featured in the dataframe.
    It takes in a range of row positions from 0 to 3723, to filter out the listing ids.
    
    INPUT:
        1. first_row: The first row position of choice. It must be an integer of a lesser value than sec_row.
        2. sec_row: The second row position of choice. It must be an integer of a greater value than first_row.
        
    OUTPUT:
        The output is a list of listing ids that fall within the chosen range of rows, with the number of times each id 
        features in the dataframe.
    '''
    
    if first_row < 0 or sec_row > listing_id_count_df.shape[0]:
        return 'The range is 0 to {} inclusive.'.format(listing_id_count_df.shape[0])
    if first_row > sec_row:
        return 'first_row must not be greater than sec_row.'
    else: 
        return listing_id_count_df.iloc[first_row:sec_row]


a = listings(2000, 2005) # Use-case example (see below)
a

3604931    298
7776701    298
139463     298
3624990    298
9715029    298
Name: listing_id, dtype: int64

**Question 2. What is the maximum number of accommodations listed per day and what is the minumum number? Which dates were they?**

In [32]:
date_count_df = df_without_nan['date'].value_counts() # Gives the number of times each date had accommodations listed
date_count_df

# The date that had the highest number of accommodation listings was 2017-01-01 with 2,922 accommodations listed. 
# In other words, maximum number of accommodation listings per day was 2,922.
# The date that had the fewest number of accommodation listings was 2016-01-04 with 1,735 accommodations listed . 
# In other words, minimum number of accommodation listings per day was 1,735.
# Also, the total number of the different dates that featured, is 365.

2017-01-01    2922
2016-12-31    2859
2016-12-30    2840
2016-12-29    2835
2016-12-28    2833
              ... 
2016-01-09    1856
2016-01-06    1826
2016-01-08    1782
2016-01-07    1776
2016-01-04    1735
Name: date, Length: 365, dtype: int64

In [24]:
date_count_df.shape[0] # Total number of rows = 365 (see below)

365

In [26]:
# To filter out any date or a range of dates and see how many accommodations listed therein, please use the function below.

def dates(first_row, sec_row):
    
    '''
    This function gives a date or a set of dates with the number of times they featured in the dataframe.
    It takes in a range of row positions from 0 to 365, to filter out the dates.
    
    INPUT:
        1. first_row: The first row position of choice. It must be an integer of a lesser value than sec_row.
        2. sec_row: The second row position of choice. It must be an integer of a greater value than first_row.
        
    OUTPUT:
        The output is a list of dates that fall within the chosen range of rows, with the number of times each date 
        features in the dataframe.
    '''
    
    if first_row < 0 or sec_row > date_count_df.shape[0]:
        return 'The range is 0 to {} inclusive.'.format(date_count_df.shape[0])
    if first_row > sec_row:
        return 'first_row must not be greater than sec_row.'
    else: 
        return date_count_df.iloc[first_row:sec_row]


b = dates(0, 5) # Use-case example (see below)
b

2017-01-01    2922
2016-12-31    2859
2016-12-30    2840
2016-12-29    2835
2016-12-28    2833
Name: date, dtype: int64

**Question 3. Which price tag was mostly used to list accommodations and which was least used? How often were they used?**

In [33]:
# Price with highest number of occurences

price_count_df = df_without_nan['price'].value_counts() # Gives the number of times each price was used (see below)
price_count_df

# From the above lines of code, the price that appeared the highest number of times, featured 36,646 times and that is $150.00. 
# In other words, the most listed accommodations are those that cost $150.00 per night.
# Those that featured for the least number of times, featured for only 1 time each.
# Also, the total number of the different prices that featured, is 669.

$150.00    36646
$100.00    31755
$75.00     29820
$125.00    27538
$65.00     26415
           ...  
$901.00        1
$744.00        1
$658.00        1
$817.00        1
$562.00        1
Name: price, Length: 669, dtype: int64

In [34]:
price_count_df.shape[0] # Total number of rows = 669

669

In [35]:
# Prices with lowest number of occurences
least_price_count = np.sum(df_without_nan['price'].value_counts() == 1) # Sum of the different prices that featured least
least_price_count
# From the above line of code, the prices that featured least (ie,for only 1 time each) are 66 in total (see below).

66

In [55]:
# To filter out the least-featured prices by row ranges, please use the function below

def leastPrices(first_row, sec_row):
    
    '''
    This function gives, at a glance, only the prices that featured for the least number of times (ie, only 1 time) in the 
    dataframe.
    It takes in a range of row positions from 0 to 66, to filter out the prices.
    
    INPUT:
        1. first_row: The first row position of choice. It must be an integer of a lesser value than sec_row.
        2. sec_row: The second row position of choice. It must be an integer of a greater value than first_row.
        
    OUTPUT:
        The output is a list of prices that featured the least (ie, only 1 time each) in the dataframe.
    '''
    
    if first_row < 0 or sec_row > least_price_count:
        return 'Your range must be 0 to {} inclusive.'.format(least_price_count)
    if first_row > sec_row:
        return 'first_row must not be greater than sec_row.'
    else: 
        return price_count_df.tail(least_price_count).iloc[first_row:sec_row]


c2 = leastPrices(0, 66) # Use-case example (see below)
c2

$406.00    1
$669.00    1
$625.00    1
$788.00    1
$637.00    1
          ..
$901.00    1
$744.00    1
$658.00    1
$817.00    1
$562.00    1
Name: price, Length: 66, dtype: int64

In [39]:
# To filter out any set of prices by row ranges, please use the function below

def prices(first_row, sec_row):
    
    '''
    This function gives a list of prices with the number of times they featured in the dataframe.
    It takes in a range of row positions from 0 to 669, to filter out the prices.
    
    INPUT:
        1. first_row: The first row position of choice. It must be an integer of a lesser value than sec_row.
        2. sec_row: The second row position of choice. It must be an integer of a greater value than first_row.
        
    OUTPUT:
        The output is a list of prices with their individual frequency of feature in the dataframe.
    '''
    
    if first_row < 0 or sec_row > price_count_df.shape[0]:
        return 'Your range must be 0 to {} inclusive.'.format(price_count_df.shape[0])
    if first_row > sec_row:
        return 'first_row must not be greater than sec_row.'
    else: 
        return price_count_df.iloc[first_row:sec_row]


c1 = prices(6, 10) # Use-case example (see below)
c1

$95.00    24327
$99.00    23629
$85.00    23455
$80.00    19817
Name: price, dtype: int64

**Question 4. Is there any correlation among the feature variables?**

In [40]:
# 'listing_id' column and others

# First determine the range of values you need to input by using the function below
def listIdRange(list_id):
    
    '''
    This function determines the range to use to filter out a dataframe of the listing id, together with other columns.
    It takes a valid listing id as input and outputs a range of row values to be input in the function in the next cell
    called 'listPlusOthers'.
    
    INPUT:
        1. list_id: A valid listing id from any of the dataframes that have been cleaned of missing values 
        (e.g. listing_id_count_df)
    
    OUTPUT:
        Output is a range of values that can be input, inclusive of the boundary values.
    '''
    listing_id_with_others = df_without_nan.loc[df_without_nan['listing_id'] == list_id, ['listing_id', 'date', 'price']]
    list_length = listing_id_with_others.shape[0]
    
    if list_length == 0:
        return 'Please input a valid listing_id.'
    else:
        return 'Your range is 0 - {}.'.format(list_length)


d1 = listIdRange(4993710) # Use-case example (see below)
d1

'Your range is 0 - 365.'

In [41]:
# 'listing_id' column and others in a range

# Now generate a dataframe covering your range of choice

def listPlusOthers(list_id, first_row, sec_row):
    
    '''
    This function is further to the above function called 'listIdRange'. It gives a dataframe of a selected range of rows
    of a particular chosen listing id, together with other columns for possible comparison.
    It takes a valid listing id, starting row number and ending row number as inputs and outputs a dataframe of 'listing_id',
    'date' and 'price'.
    
    INPUT.
        1. list_id: A valid listing id from any of the dataframes that have been cleaned of missing values 
        (e.g. date_count_df)
        2. first_row: The starting row number.
        3. sec_row: The ending row number.
    
    OUTPUT:
        Output is a dataframe of the chosen listing id, date and price, covering a range of the starting row number and the 
        ending row number.
    '''
    
    listing_id_with_others = df_without_nan.loc[df_without_nan['listing_id'] == list_id, ['listing_id', 'date', 'price']] # The needed columns and number of rows
    list_length = listing_id_with_others.shape[0]
    
    if list_length == 0:
        return 'Please input a valid listing_id.'
    elif first_row < 0 or sec_row > list_length:
        return 'Your range must be 0 to {} inclusive.'.format(list_length)
    elif first_row > sec_row:
        return 'first_row must not be greater than sec_row.'
    else:
        listing_id_with_others = listing_id_with_others.iloc[first_row:sec_row] # The needed columns and number of rows
        col_reset_listing_id_df = listing_id_with_others.reset_index() # Resetting the column names
        col_reset_listing_id_df.rename(columns={'index': 'serial_no', 'listing_id': 'listing_id', 'date': 'date', 'price': 'price'}, inplace=True) # Reassigning column names
        return col_reset_listing_id_df


d2 = listPlusOthers(4993710, 0, 10) # Use case example (see below)
d2

'''There is no definite correlation between the listing ids and other feature variables since one listing id can be used on 
different dates and also to tag different accommodation prices.'''

Unnamed: 0,serial_no,listing_id,date,price
0,1375685,4993710,2016-01-04,$99.00
1,1375686,4993710,2016-01-05,$99.00
2,1375687,4993710,2016-01-06,$99.00
3,1375688,4993710,2016-01-07,$99.00
4,1375689,4993710,2016-01-08,$104.00
5,1375690,4993710,2016-01-09,$104.00
6,1375691,4993710,2016-01-10,$99.00
7,1375692,4993710,2016-01-11,$99.00
8,1375693,4993710,2016-01-12,$99.00
9,1375694,4993710,2016-01-13,$99.00


In [42]:
# 'date' column and others

# First determine the range of values you need to input by using the function below.

def dateRange(aDate):
    
    '''
    This function determines the range to use to filter out a dataframe of the date, together with other columns.
    It takes a valid date as input and outputs a range of row values to be input in the function in the next cell
    called 'datePlusOthers'.
    
    INPUT.
        1. aDate: A valid date in the format 'yyyy-mm-dd', from any of the dataframes that have been cleaned of missing values 
        (e.g. date_count_df).
    
    OUTPUT:
        Output is a range of values that can be input, inclusive of the boundary values.
    '''
    date_with_others = df_without_nan.loc[df_without_nan['date'] == aDate, ['date', 'listing_id', 'price']]
    date_length = date_with_others.shape[0]
    
    if date_length == 0:
        return 'Please input a valid date.'
    else:
        return 'Your range is 0 - {}.'.format(date_length)


e1 = dateRange('2016-12-25') # Use-case example (see below)
e1

'Your range is 0 - 2829.'

In [43]:
# 'date' column and others in a range

# Now generate a dataframe covering your range of choice.

def datePlusOthers(aDate, first_row, sec_row):
    
    '''
    This function is further to the above function called 'dateRange'. It gives a dataframe of a selected range of rows
    of a particular chosen date, together with other columns for possible comparison.
    It takes a valid date, starting row number and ending row number as inputs and outputs a dataframe of 'date',
    'listing_id' and 'price'.
    
    INPUT.
        1. aDate: A valid date from any of the dataframes that have been cleaned of missing values 
        (e.g. date_count_df)
        2. first_row: The starting row number.
        3. sec_row: The ending row number.
    
    OUTPUT:
        Output is a dataframe of the chosen date, listing id and price, covering a range of the starting row number and the 
        ending row number.
    '''
    
    date_with_others = df_without_nan.loc[df_without_nan['date'] == aDate, ['date', 'listing_id', 'price']] # The needed columns and number of rows
    date_length = date_with_others.shape[0]
    
    if date_length == 0:
        return 'Please input a valid date.'
    elif first_row < 0 or sec_row > date_length:
        return 'Your range must be 0 to {} inclusive.'.format(date_length)
    elif first_row > sec_row:
        return 'first_row must not be greater than sec_row.'
    else:
        date_with_others = date_with_others.iloc[first_row:sec_row] # The needed columns and number of rows
        col_reset_date_df = date_with_others.reset_index() # Resetting the column names
        col_reset_date_df.rename(columns={'index': 'serial_no', 'date': 'date', 'listing_id': 'listing_id', 'price': 'price'}, inplace=True) # Reassigning column names
        return col_reset_date_df


e2 = datePlusOthers('2016-12-25', 0, 5) # Use case example
e2

'''There is no definite correlation between the dates and other feature variables since, on one date, different listing ids can 
be used to list accommodations and at different price tags.'''

Unnamed: 0,serial_no,date,listing_id,price
0,356,2016-12-25,241032,$85.00
1,721,2016-12-25,953595,$222.00
2,1086,2016-12-25,3308979,"$1,650.00"
3,1451,2016-12-25,7421966,$100.00
4,1816,2016-12-25,278830,$450.00


In [44]:
# 'price' column and others

# First determine the range of values you need to input by using the function below.

def priceRange(price):
    
    '''
    This function determines the range to use to filter out a dataframe of the price, together with other columns.
    It takes a valid price as input and outputs a range of row values to be input in the function in the next cell
    called 'pricePlusOthers'.
    
    INPUT.
        1. price: A valid price in the format '$85.00' (for instance), from any of the dataframes that have been cleaned of missing values 
        (e.g. price_count_df).
    
    OUTPUT:
        Output is a range of values that can be input, inclusive of the boundary values.
    '''
    price_with_others = df_without_nan.loc[df_without_nan['price'] == price, ['price', 'date', 'listing_id']]
    price_length = price_with_others.shape[0]
    
    if price_length == 0:
        return 'Please input a valid price.'
    else:
        return 'Your range is 0 - {}.'.format(price_length)


f1 = priceRange('$85.00') # Use-case example (see below)
f1

'Your range is 0 - 23455.'

In [48]:
# 'price' column and others in a range

# Now generate a dataframe covering your range of choice.

def pricePlusOthers(price, first_row, sec_row):
    
    '''
    This function is further to the above function called 'priceRange'. It gives a dataframe of a selected range of rows
    of a particular chosen price, together with other columns for possible comparison.
    It takes a valid price, starting row number and ending row number as inputs and outputs a dataframe of 'price',
    'listing_id' and 'date'.
    
    INPUT.
        1. price: A valid price from any of the dataframes that have been cleaned of missing values 
        (e.g. price_count_df)
        2. first_row: The starting row number.
        3. sec_row: The ending row number.
    
    OUTPUT:
        Output is a dataframe of the chosen price, listing id and date, covering a range of the starting row number and the 
        ending row number.
    '''
    
    price_with_others = df_without_nan.loc[df_without_nan['price'] == price, ['price', 'date', 'listing_id']] # The needed columns and number of rows
    price_length = price_with_others.shape[0]
    
    if price_length == 0:
        return 'Please input a valid price.'
    elif first_row < 0 or sec_row > price_length:
        return 'Your range must be 0 to {} inclusive.'.format(price_length)
    elif first_row > sec_row:
        return 'first_row must not be greater than sec_row.'
    else:
        price_with_others = price_with_others.iloc[first_row:sec_row] # The needed columns and number of rows
        col_reset_price_df = price_with_others.reset_index() # Resetting the column names
        col_reset_price_df.rename(columns={'index': 'serial_no', 'price': 'price', 'date': 'date', 'listing_id': 'listing_id'}, inplace=True) # Reassigning column names
        return col_reset_price_df


f2 = pricePlusOthers('$85.00', 0, 600) # Use case example (see below)
f2

'''There is no definite correlation between the prices and other feature variables since, for a price tag, different listing ids 
can be used to list accommodations and on different dates.'''

Unnamed: 0,serial_no,price,date,listing_id
0,0,$85.00,2016-01-04,241032
1,1,$85.00,2016-01-05,241032
2,9,$85.00,2016-01-13,241032
3,10,$85.00,2016-01-14,241032
4,14,$85.00,2016-01-18,241032
...,...,...,...,...
595,21827,$85.00,2016-10-22,9282409
596,21828,$85.00,2016-10-23,9282409
597,21829,$85.00,2016-10-24,9282409
598,21830,$85.00,2016-10-25,9282409
