# Basics - Indexing, Labelling and Ordering

Using AirBnB data set: https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data

* set_index
* reset_index
* sort_values
* sort_index
* unique
* value_counts
* rank

In [1]:
import pandas as pd

df = pd.read_csv("AB_NYC_2019.csv")
df.head(3)

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365


## Indexing
The index is the number of the left, which is the unique value that can identify each row. By default, the index is generated by counting up from zero. But in this data, the database index (which is called the primary key) `id` would also be another good choice.

In [69]:
df2 = df.set_index("id")
df2.head(3)

Unnamed: 0_level_0,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2611458,close to Manhattan country setting,13373889,Aaron,Staten Island,Concord,40.60375,-74.08065,Private room,129,1,40,2018-10-14,0.85,2,86
13370393,Charming.,13373889,Aaron,Staten Island,Concord,40.60556,-74.08274,Entire home/apt,150,7,1,2018-11-04,0.12,2,83
19970350,Newly renovated clean and Cozy Private room,15344412,Abe,Staten Island,New Springville,40.58085,-74.15443,Private room,43,10,0,,,3,89


In [70]:
# pulling the index (id)
df2.name[2539]

'Clean & quiet apt home by the park'

In [6]:

df3 = df.groupby("room_type").mean()
df3

  df3 = df.groupby("room_type").mean()


Unnamed: 0_level_0,id,host_id,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
room_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Entire home/apt,18438180.0,61755930.0,40.728649,-73.960696,211.794246,8.506907,22.842418,1.306578,10.698335,111.920304
Private room,19468930.0,72475140.0,40.729208,-73.942924,89.780973,5.3779,24.112962,1.445209,3.227717,111.203933
Shared room,23003780.0,102624100.0,40.730514,-73.943343,70.127586,6.475,16.6,1.471726,4.662931,162.000862


In [9]:
df3.reset_index()

Unnamed: 0,room_type,id,host_id,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
0,Entire home/apt,18438180.0,61755930.0,40.728649,-73.960696,211.794246,8.506907,22.842418,1.306578,10.698335,111.920304
1,Private room,19468930.0,72475140.0,40.729208,-73.942924,89.780973,5.3779,24.112962,1.445209,3.227717,111.203933
2,Shared room,23003780.0,102624100.0,40.730514,-73.943343,70.127586,6.475,16.6,1.471726,4.662931,162.000862


In [6]:
df3.reset_index(drop=True)

Unnamed: 0,id,host_id,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
0,18438180.0,61755930.0,40.728649,-73.960696,211.794246,8.506907,22.842418,1.306578,10.698335,111.920304
1,19468930.0,72475140.0,40.729208,-73.942924,89.780973,5.3779,24.112962,1.445209,3.227717,111.203933
2,23003780.0,102624100.0,40.730514,-73.943343,70.127586,6.475,16.6,1.471726,4.662931,162.000862


## Sorting

use `sort_index` after setting it. If want the df sorted, commonly use `sort_values`

In [34]:
df3.sort_index()

Unnamed: 0_level_0,id,host_id,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
room_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Entire home/apt,18438180.0,61755930.0,40.728649,-73.960696,211.794246,8.506907,22.842418,1.306578,10.698335,111.920304
Private room,19468930.0,72475140.0,40.729208,-73.942924,89.780973,5.3779,24.112962,1.445209,3.227717,111.203933
Shared room,23003780.0,102624100.0,40.730514,-73.943343,70.127586,6.475,16.6,1.471726,4.662931,162.000862


In [35]:
df3.sort_index(ascending=False)

Unnamed: 0_level_0,id,host_id,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365
room_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Shared room,23003780.0,102624100.0,40.730514,-73.943343,70.127586,6.475,16.6,1.471726,4.662931,162.000862
Private room,19468930.0,72475140.0,40.729208,-73.942924,89.780973,5.3779,24.112962,1.445209,3.227717,111.203933
Entire home/apt,18438180.0,61755930.0,40.728649,-73.960696,211.794246,8.506907,22.842418,1.306578,10.698335,111.920304


In [37]:
df.sort_values(["host_name"], ascending=[True])
df.head(3)

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365


In [38]:
df.sort_values(["host_name"])
df.head(3)

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365


In [15]:
df.sort_values(["neighbourhood_group", "host_name"])
df.head(3)

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,2539,Clean & quiet apt home by the park,2787,John,Brooklyn,Kensington,40.64749,-73.97237,Private room,149,1,9,2018-10-19,0.21,6,365
1,2595,Skylit Midtown Castle,2845,Jennifer,Manhattan,Midtown,40.75362,-73.98377,Entire home/apt,225,1,45,2019-05-21,0.38,2,355
2,3647,THE VILLAGE OF HARLEM....NEW YORK !,4632,Elisabeth,Manhattan,Harlem,40.80902,-73.9419,Private room,150,3,0,,,1,365


In [39]:
df.host_name.unique()

array(['John', 'Jennifer', 'Elisabeth', ..., 'Abayomi', 'Alberth',
       'Ilgar & Aysel'], dtype=object)

In [40]:
df.host_name.value_counts()

Michael              417
David                403
Sonder (NYC)         327
John                 294
Alex                 279
                    ... 
Rhonycs                1
Brandy-Courtney        1
Shanthony              1
Aurore And Jamila      1
Ilgar & Aysel          1
Name: host_name, Length: 11452, dtype: int64

In [41]:
df.neighbourhood_group.unique()

array(['Brooklyn', 'Manhattan', 'Queens', 'Staten Island', 'Bronx'],
      dtype=object)

In [42]:
df.neighbourhood_group.value_counts()

Manhattan        21661
Brooklyn         20104
Queens            5666
Bronx             1091
Staten Island      373
Name: neighbourhood_group, dtype: int64

In [44]:
df.sort_values(["neighbourhood_group", "host_name"])
df.head(3)

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
4079,2611458,close to Manhattan country setting,13373889,Aaron,Staten Island,Concord,40.60375,-74.08065,Private room,129,1,40,2018-10-14,0.85,2,86
16714,13370393,Charming.,13373889,Aaron,Staten Island,Concord,40.60556,-74.08274,Entire home/apt,150,7,1,2018-11-04,0.12,2,83
24922,19970350,Newly renovated clean and Cozy Private room,15344412,Abe,Staten Island,New Springville,40.58085,-74.15443,Private room,43,10,0,,,3,89


In [47]:
df.sort_values(["neighbourhood_group", "host_name"], ascending=[False, True], inplace=True)
df.head(3)

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
4079,2611458,close to Manhattan country setting,13373889,Aaron,Staten Island,Concord,40.60375,-74.08065,Private room,129,1,40,2018-10-14,0.85,2,86
16714,13370393,Charming.,13373889,Aaron,Staten Island,Concord,40.60556,-74.08274,Entire home/apt,150,7,1,2018-11-04,0.12,2,83
24922,19970350,Newly renovated clean and Cozy Private room,15344412,Abe,Staten Island,New Springville,40.58085,-74.15443,Private room,43,10,0,,,3,89


 Inplace is an argument used in different functions. Some functions in which inplace is used as an attributes like, set_index(), dropna(), fillna(), reset_index(), drop(), replace() and many more. The default value of this attribute is False and it returns the copy of the object.

In [58]:
# import required module
import pandas as pd


In [59]:
# creating dataframe
dataframe = pd.DataFrame({'Name':['Shobhit','vaibhav', 'vimal','Sourabh'], 'Class':[11,12,10,9], 'Age':[18,20,21,17]})

# Checking created dataframe
display(dataframe)

Unnamed: 0,Name,Class,Age
0,Shobhit,11,18
1,vaibhav,12,20
2,vimal,10,21
3,Sourabh,9,17


To see the inplace use we are going to use the rename function where we are renaming “Name” Column to “FirstName”. 

In [60]:
# without using inplace renaming the column
new_data = dataframe.rename(columns = {'Name':'FirstName'})

# check new_data
display(new_data)


Unnamed: 0,FirstName,Class,Age
0,Shobhit,11,18
1,vaibhav,12,20
2,vimal,10,21
3,Sourabh,9,17


In [61]:
# putting inplace=False
new_data_2 = dataframe.rename(columns = {'Name':'FirstName'},
                            inplace = False)

#check new_data_2
display(new_data_2)


Unnamed: 0,FirstName,Class,Age
0,Shobhit,11,18
1,vaibhav,12,20
2,vimal,10,21
3,Sourabh,9,17


In [62]:
# Putting Inplace=True
dataframe.rename(columns = {'Name':'FirstName'},
                        inplace = True)

# check whether dataframe is modified or not
print(dataframe)


  FirstName  Class  Age
0   Shobhit     11   18
1   vaibhav     12   20
2     vimal     10   21
3   Sourabh      9   17


In [63]:
# importing pandas
import pandas as pd

# creating dataframe
dataframe=pd.DataFrame({'Name':['Shobhit','Vaibhav','Vimal','Sourabh'],'Class':[11,12,10,9],'Age':[18,20,21,17]})

# Checking created dataframe
# copied dataframe
display(dataframe)

# without using inplace renaming the column
new_data = dataframe.rename(columns = {'Name':'FirstName'})

# Copied dataframe
display(new_data)

# checking whether dataframe is modified or not
# Original dataframe
display(dataframe)

# putting inplace=False
new_data_2 = dataframe.rename(columns = {'Name':'FirstName'},
                            inplace = False)

# Copied dataframe
display(new_data_2)

# checking whether dataframe is modified or not
# Original dataframe
display(dataframe)

# Putting Inplace=True
dataframe.rename(columns = {'Name':'FirstName'},
                        inplace = True)

# checking whether dataframe is modified or not
# Original dataframe
display(dataframe)


Unnamed: 0,Name,Class,Age
0,Shobhit,11,18
1,Vaibhav,12,20
2,Vimal,10,21
3,Sourabh,9,17


Unnamed: 0,FirstName,Class,Age
0,Shobhit,11,18
1,Vaibhav,12,20
2,Vimal,10,21
3,Sourabh,9,17


Unnamed: 0,Name,Class,Age
0,Shobhit,11,18
1,Vaibhav,12,20
2,Vimal,10,21
3,Sourabh,9,17


Unnamed: 0,FirstName,Class,Age
0,Shobhit,11,18
1,Vaibhav,12,20
2,Vimal,10,21
3,Sourabh,9,17


Unnamed: 0,Name,Class,Age
0,Shobhit,11,18
1,Vaibhav,12,20
2,Vimal,10,21
3,Sourabh,9,17


Unnamed: 0,FirstName,Class,Age
0,Shobhit,11,18
1,Vaibhav,12,20
2,Vimal,10,21
3,Sourabh,9,17


## Rank

Like sorting, but with collision detection.

In [65]:
dfp = df.sort_values("price", ascending=False)
dfp[["id", "host_name", "price"]].head(5)

Unnamed: 0,id,host_name,price
9151,7003697,Kathrine,10000
17692,13894339,Erin,10000
29238,22436899,Jelena,10000
12342,9528920,Amy,9999
40433,31340283,Matt,9999


In [66]:
dfp["price_rank"] = dfp.price.rank(method="max", ascending=False)

In [68]:
dfp[["id", "host_name", "price", "price_rank"]].head(5)

Unnamed: 0,id,host_name,price,price_rank
9151,7003697,Kathrine,10000,3.0
17692,13894339,Erin,10000,3.0
29238,22436899,Jelena,10000,3.0
12342,9528920,Amy,9999,6.0
40433,31340283,Matt,9999,6.0
