# SLU3 - Transforming Data: Learning notebook

In this notebook we'll cover the following:

    - Dropping rows and columns 
    - Inplace
    - Copying dataframes
    - Basic math operations with dataframes
    - Group by (split/apply/combine)
    - Concat
    - Sorting the index
    - Setting the index
    - Resetting the index
    - Sorting values

We start by importing pandas:

In [1]:
import pandas as pd

# This is an option to preview less rows in the notebook's cells' outputs
pd.options.display.max_rows = 6

The dataset that we'll use in this unit is located in the __data__ directory and is called __airbnb_rooms.csv__.

We'll use function __read_csv( )__ to load the dataset into a pandas DataFrame. The argument index_col helps us set the __room_id__ as the DataFrame index.

In [2]:
# Read the data in file airbnb_rooms.csv into a pandas DataFrame and use column room_id as the DataFrame index.
df = pd.read_csv('data/airbnb_rooms.csv', index_col='room_id')

# Preview the first rows of the DataFrame.
df.head()

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
6499,14455,Entire home/apt,Belém,8,5.0,2,1.0,57.0
17031,66015,Entire home/apt,Alvalade,0,0.0,2,1.0,46.0
25659,107347,Entire home/apt,Santa Maria Maior,63,5.0,3,1.0,69.0
29248,125768,Entire home/apt,Santa Maria Maior,225,4.5,4,1.0,58.0
29396,126415,Entire home/apt,Santa Maria Maior,132,5.0,4,1.0,67.0


## Dropping rows and columns

In order to drop rows and columns from a DataFrame, we can use function [drop](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop.html).

In order to drop a row, we do the following:

In [3]:
# This drops the row with index 17031
df.drop(labels=17031)

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
6499,14455,Entire home/apt,Belém,8,5.0,2,1.0,57.0
25659,107347,Entire home/apt,Santa Maria Maior,63,5.0,3,1.0,69.0
29248,125768,Entire home/apt,Santa Maria Maior,225,4.5,4,1.0,58.0
...,...,...,...,...,...,...,...,...
19396300,6115933,Entire home/apt,Santo António,0,0.0,6,4.0,138.0
19397373,97139334,Entire home/apt,São Vicente,0,0.0,4,1.0,56.0
19400722,28219108,Entire home/apt,Areeiro,0,0.0,5,3.0,75.0


In order to drop a column, we do the following:

In [4]:
# This drops column neighborhood
df.drop(columns='neighborhood')

Unnamed: 0_level_0,host_id,room_type,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
6499,14455,Entire home/apt,8,5.0,2,1.0,57.0
17031,66015,Entire home/apt,0,0.0,2,1.0,46.0
25659,107347,Entire home/apt,63,5.0,3,1.0,69.0
...,...,...,...,...,...,...,...
19396300,6115933,Entire home/apt,0,0.0,6,4.0,138.0
19397373,97139334,Entire home/apt,0,0.0,4,1.0,56.0
19400722,28219108,Entire home/apt,0,0.0,5,3.0,75.0


If we want to drop multiple rows (or columns), we can use lists:

In [5]:
df.drop(labels=[6499, 17031])

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
25659,107347,Entire home/apt,Santa Maria Maior,63,5.0,3,1.0,69.0
29248,125768,Entire home/apt,Santa Maria Maior,225,4.5,4,1.0,58.0
29396,126415,Entire home/apt,Santa Maria Maior,132,5.0,4,1.0,67.0
...,...,...,...,...,...,...,...,...
19396300,6115933,Entire home/apt,Santo António,0,0.0,6,4.0,138.0
19397373,97139334,Entire home/apt,São Vicente,0,0.0,4,1.0,56.0
19400722,28219108,Entire home/apt,Areeiro,0,0.0,5,3.0,75.0


## Inplace

In general, functions that are used to transform DataFrames, have a parameter called __inplace__.

This parameter can be used to decide if we want to modify the original DataFrame (really changing it) or if we just want the function to return a copy of the DataFrame with the transformation applied.

Let's see an example where we use inplace=False:

In [6]:
# We drop column neighborhood from df, with inplace=False
# inplace=False is the default behaviour, we're just writing it down in order to be explicit
df_new = df.drop(columns='neighborhood', inplace=False)

# Check if df has a neighborhood column
# This should print True, since we used inplace=False, thus the original DataFrame is not changed
print('df has column "neighborhood"? >>> {}'.format('neighborhood' in df.columns))

# Check if df_new has a neighborhood column
# This should print False because df_new is the transformed version of df
print('df_new has column "neighborhood"? >>> {}'.format('neighborhood' in df_new.columns))

df has column "neighborhood"? >>> True
df_new has column "neighborhood"? >>> False


And another example with inplace=True:

In [7]:
df_new = df.drop(columns='neighborhood', inplace=True)

# Check if df has a neighborhood column
# This should print False, since we used inplace=True, thus the original DataFrame is changed
print('df has column "neighborhood"? >>> {}'.format('neighborhood' in df.columns))

# With inplace=True, the return value of function drop is None
print('df_new >>> {}'.format(df_new))

df has column "neighborhood"? >>> False
df_new >>> None


## Copying DataFrames

When we're transforming DataFrames, it can be usefull to keep a copy of the original DataFrame.

We can do that with function [copy](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.copy.html):

In [8]:
# copy function returns a copy of df
df_original = df.copy()

df_original.head()

Unnamed: 0_level_0,host_id,room_type,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
6499,14455,Entire home/apt,8,5.0,2,1.0,57.0
17031,66015,Entire home/apt,0,0.0,2,1.0,46.0
25659,107347,Entire home/apt,63,5.0,3,1.0,69.0
29248,125768,Entire home/apt,225,4.5,4,1.0,58.0
29396,126415,Entire home/apt,132,5.0,4,1.0,67.0


## Basic math operations

### Between a constant and a DataFrame column

The operation is repeated for each row.

For example, if we want to compute the rooms' price per week (7 nights):

In [9]:
# Creates a new column in the DataFrame (price_per_week), where each row is equal to the price * 7
df['price_per_week'] = df.price * 7
df.head()

Unnamed: 0_level_0,host_id,room_type,reviews,overall_satisfaction,accommodates,bedrooms,price,price_per_week
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
6499,14455,Entire home/apt,8,5.0,2,1.0,57.0,399.0
17031,66015,Entire home/apt,0,0.0,2,1.0,46.0,322.0
25659,107347,Entire home/apt,63,5.0,3,1.0,69.0,483.0
29248,125768,Entire home/apt,225,4.5,4,1.0,58.0,406.0
29396,126415,Entire home/apt,132,5.0,4,1.0,67.0,469.0


### Between a two DataFrame columns

The operation is performed element-wise, i.e, for each row, we apply the operation between the two columns' values.

For instance, if we want to compute the people per bedroom ratio in each room:

In [10]:
# Creates a new column in the DataFrame (people_per_bedroom), 
# where each row is equal to the value of the accommodates column divided by the bedrooms column
df['people_per_bedroom'] = df.accommodates / df.bedrooms

df.head()

Unnamed: 0_level_0,host_id,room_type,reviews,overall_satisfaction,accommodates,bedrooms,price,price_per_week,people_per_bedroom
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
6499,14455,Entire home/apt,8,5.0,2,1.0,57.0,399.0,2.0
17031,66015,Entire home/apt,0,0.0,2,1.0,46.0,322.0,2.0
25659,107347,Entire home/apt,63,5.0,3,1.0,69.0,483.0,3.0
29248,125768,Entire home/apt,225,4.5,4,1.0,58.0,406.0,4.0
29396,126415,Entire home/apt,132,5.0,4,1.0,67.0,469.0,4.0


## Group by

This is a very extensive topic, and we'll just touch it's surface here, so that you know that exists and can explore it further later by your own.

In case you've worked with SQL before, you'll find this very familiar :)

So, in Pandas there is process a of three chained steps called [split-apply-combine](https://pandas.pydata.org/pandas-docs/stable/groupby.html):
* __split__: splitting the DataFrame into groups (this is the groupby)
* __apply__: apply a function to each group (aggregation, transformation and filtration)
* __combine__: create a DataFrame with the results

Let's see an example!

We want to find how many rooms each landlord has.
For that, consider this smaller DataFrame:

In [11]:
df_smaller = pd.read_csv('data/airbnb_groupby.csv', index_col='room_id')

df_smaller

Unnamed: 0_level_0,host_id
room_id,Unnamed: 1_level_1
347518,1756107
347530,1756107
751806,3953109
785197,3953109
1015979,3953109
16238983,106149355


So the first step is to group our data by 'host_id'. This returns a DataFrameGroupBy object that by itself doesn't tell us much.

However, we can use the group property of the DataFrameGroupBy object to inspect the groups.

In [12]:
df_grouped_by_host_id = df_smaller.groupby('host_id')

df_grouped_by_host_id

<pandas.core.groupby.DataFrameGroupBy object at 0x10c07e7f0>

In [13]:
df_grouped_by_host_id.groups

{1756107: Int64Index([347518, 347530], dtype='int64', name='room_id'),
 3953109: Int64Index([751806, 785197, 1015979], dtype='int64', name='room_id'),
 106149355: Int64Index([16238983], dtype='int64', name='room_id')}

Then, the next step is to apply a function to each group, that aggregates our grouped data. In this case, we want to find the size of each group, i.e, the number of room in each group:

In [14]:
df_grouped_by_host_id.size()

host_id
1756107      2
3953109      3
106149355    1
dtype: int64

Another example. Let's find out what's the average price per room and the maximum number of bedrooms per room in our original dataset.

In [15]:
# Read the dataset again, as we lost the neighborhood column along the way...
df = pd.read_csv('data/airbnb_rooms.csv', index_col='room_id')

# Group data by neighborhood
df_grouped = df.groupby('neighborhood')

# Aggregate the price using the average function and aggregate the bedrooms with the max function
df_grouped.agg({'price': 'mean', 'bedrooms': 'max'})

Unnamed: 0_level_0,price,bedrooms
neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1
Ajuda,63.435185,9.0
Alcântara,72.582160,8.0
Alvalade,70.098814,5.0
...,...,...
Santo António,83.558371,8.0
São Domingos de Benfica,203.152174,8.0
São Vicente,75.732949,9.0


## Concat

The [concat](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html) function can be used to concatenate DataFrames, either along the rows or the columns.

Let's see some examples.

Imagine we have two DataFrames, one with rooms in Areeiro, and the other with rooms in Benfica.
And now we want to concatenate the two DataFrames in order to have a unique DataFrame with all the rooms.

In [16]:
# Get all the rooms in Areeiro
df_areeiro = df[df.neighborhood == 'Areeiro']
print('We have {} rooms in Areeiro'.format(len(df_areeiro)))

# Get all the rooms in Benfica
df_benfica = df[df.neighborhood == 'Benfica']
print('We have {} rooms in Benfica'.format(len(df_benfica)))

# Create a unique DataFrame by concatenating df_areeeiro and df_benfica
df_areeiro_benfica = pd.concat([df_areeiro, df_benfica])

df_areeiro_benfica

We have 280 rooms in Areeiro
We have 71 rooms in Benfica


Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
231385,1208572,Private room,Areeiro,40,5.0,3,1.0,29.0
536404,2635294,Private room,Areeiro,1,0.0,2,1.0,40.0
578118,2010790,Private room,Areeiro,3,3.5,2,1.0,23.0
...,...,...,...,...,...,...,...,...
19113703,129744959,Entire home/apt,Benfica,0,0.0,6,3.0,357.0
19147754,133878061,Private room,Benfica,0,0.0,16,5.0,58.0
19299623,71859210,Entire home/apt,Benfica,0,0.0,5,2.0,102.0


Let's get some more data from file __airbnb_locations.csv__.

This dataset has the coordinates for each room.

In [17]:
df_locations = pd.read_csv('data/airbnb_locations.csv', index_col='room_id')

df_locations

Unnamed: 0_level_0,latitude,longitude
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1
6499,38.696747,-9.198404
17031,38.747643,-9.140850
25659,38.711671,-9.126964
...,...,...
19396300,38.725280,-9.143635
19397373,38.715371,-9.122679
19400722,38.744072,-9.140559


Our next example is to concatenate df_areeiro_benfica and df_locations on the columns. I.e, we want to add columns __latitude__ and __longitude__ to df_areeiro_benfica.

In order to tell function concat that we want to perform the concatenation along the columns, we use the __axis=1__ parameter.

In [18]:
# Concatenate DataFrames df_areeiro_benfica and df_locations along the columns
pd.concat([df_areeiro_benfica, df_locations], axis=1)

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price,latitude,longitude
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
6499,,,,,,,,,38.696747,-9.198404
17031,,,,,,,,,38.747643,-9.140850
25659,,,,,,,,,38.711671,-9.126964
...,...,...,...,...,...,...,...,...,...,...
19396300,,,,,,,,,38.725280,-9.143635
19397373,,,,,,,,,38.715371,-9.122679
19400722,28219108.0,Entire home/apt,Areeiro,0.0,0.0,5.0,3.0,75.0,38.744072,-9.140559


But... What are all those NaN values??

Well, there are certain indexes that were only found in df_locations (and not in df_areeiro_benfica).
So, in this cases, the concat function fills the missing values with NaN.

And what if we only want to keep the rooms that exist in both DataFrames? We use the __join='inner'__ parameter.

In [19]:
# Concatenate DataFrames df_areeiro_benfica and df_locations along the columns,
# only keeping rows that exists in both DataFrames
pd.concat([df_areeiro_benfica, df_locations], axis=1, join='inner')

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price,latitude,longitude
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
231385,1208572,Private room,Areeiro,40,5.0,3,1.0,29.0,38.745098,-9.138529
536404,2635294,Private room,Areeiro,1,0.0,2,1.0,40.0,38.737840,-9.131147
578118,2010790,Private room,Areeiro,3,3.5,2,1.0,23.0,38.739023,-9.132386
...,...,...,...,...,...,...,...,...,...,...
19113703,129744959,Entire home/apt,Benfica,0,0.0,6,3.0,357.0,38.749577,-9.205616
19147754,133878061,Private room,Benfica,0,0.0,16,5.0,58.0,38.749719,-9.200276
19299623,71859210,Entire home/apt,Benfica,0,0.0,5,2.0,102.0,38.755440,-9.202930


## Sorting the index

With the [sort_index](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_index.html) function, we can sort the DataFrame along the index.

For instance, our DataFrame df was already sorted along the index, but we can resort it from bigger to smaller rooms ids, using the __ascending=False__ parameter.

In [20]:
# Original df
df.head()

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
6499,14455,Entire home/apt,Belém,8,5.0,2,1.0,57.0
17031,66015,Entire home/apt,Alvalade,0,0.0,2,1.0,46.0
25659,107347,Entire home/apt,Santa Maria Maior,63,5.0,3,1.0,69.0
29248,125768,Entire home/apt,Santa Maria Maior,225,4.5,4,1.0,58.0
29396,126415,Entire home/apt,Santa Maria Maior,132,5.0,4,1.0,67.0


In [21]:
# df with the index sorted from bigger to smaller room_id
df.sort_index(ascending=False)

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
19400722,28219108,Entire home/apt,Areeiro,0,0.0,5,3.0,75.0
19397373,97139334,Entire home/apt,São Vicente,0,0.0,4,1.0,56.0
19396300,6115933,Entire home/apt,Santo António,0,0.0,6,4.0,138.0
...,...,...,...,...,...,...,...,...
25659,107347,Entire home/apt,Santa Maria Maior,63,5.0,3,1.0,69.0
17031,66015,Entire home/apt,Alvalade,0,0.0,2,1.0,46.0
6499,14455,Entire home/apt,Belém,8,5.0,2,1.0,57.0


## Resetting the index

We can reset the index of a DataFrame with function [reset_index](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reset_index.html).
This will convert the index into a range from 0 to the length of the DataFrame minus 1.

Regarding the old index, we can either keep it by adding it as a column in the DataFrame (__drop=False__, this is the default behaviour) or reset it completely (__drop=True__).

In [22]:
# Resetting the index and keeping it as a new column room_id
df.reset_index()

Unnamed: 0,room_id,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
0,6499,14455,Entire home/apt,Belém,8,5.0,2,1.0,57.0
1,17031,66015,Entire home/apt,Alvalade,0,0.0,2,1.0,46.0
2,25659,107347,Entire home/apt,Santa Maria Maior,63,5.0,3,1.0,69.0
...,...,...,...,...,...,...,...,...,...
13229,19396300,6115933,Entire home/apt,Santo António,0,0.0,6,4.0,138.0
13230,19397373,97139334,Entire home/apt,São Vicente,0,0.0,4,1.0,56.0
13231,19400722,28219108,Entire home/apt,Areeiro,0,0.0,5,3.0,75.0


In [23]:
# Resetting the index and dropping it -> no new column is added
df.reset_index(drop=True)

Unnamed: 0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
0,14455,Entire home/apt,Belém,8,5.0,2,1.0,57.0
1,66015,Entire home/apt,Alvalade,0,0.0,2,1.0,46.0
2,107347,Entire home/apt,Santa Maria Maior,63,5.0,3,1.0,69.0
...,...,...,...,...,...,...,...,...
13229,6115933,Entire home/apt,Santo António,0,0.0,6,4.0,138.0
13230,97139334,Entire home/apt,São Vicente,0,0.0,4,1.0,56.0
13231,28219108,Entire home/apt,Areeiro,0,0.0,5,3.0,75.0


## Setting the index

With function [set_index](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.set_index.html), we can set a new index for our DataFrame.

The old index is dropped.

In this function, the __drop=True__ parameter deletes the column to be used as the new index, which is the default behaviour, and __drop=False__ keeps the column unchanged.

In [24]:
# Setting column neighborhood as the new index
# The neighborhood column is dropped from the DataFrame, this is the default behaviour
df.set_index('neighborhood', drop=True)

Unnamed: 0_level_0,host_id,room_type,reviews,overall_satisfaction,accommodates,bedrooms,price
neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Belém,14455,Entire home/apt,8,5.0,2,1.0,57.0
Alvalade,66015,Entire home/apt,0,0.0,2,1.0,46.0
Santa Maria Maior,107347,Entire home/apt,63,5.0,3,1.0,69.0
...,...,...,...,...,...,...,...
Santo António,6115933,Entire home/apt,0,0.0,6,4.0,138.0
São Vicente,97139334,Entire home/apt,0,0.0,4,1.0,56.0
Areeiro,28219108,Entire home/apt,0,0.0,5,3.0,75.0


In [25]:
# Setting column neighborhood as the new index
# The neighborhood column is NOT dropped from the DataFrame
df.set_index('neighborhood', drop=False)

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Belém,14455,Entire home/apt,Belém,8,5.0,2,1.0,57.0
Alvalade,66015,Entire home/apt,Alvalade,0,0.0,2,1.0,46.0
Santa Maria Maior,107347,Entire home/apt,Santa Maria Maior,63,5.0,3,1.0,69.0
...,...,...,...,...,...,...,...,...
Santo António,6115933,Entire home/apt,Santo António,0,0.0,6,4.0,138.0
São Vicente,97139334,Entire home/apt,São Vicente,0,0.0,4,1.0,56.0
Areeiro,28219108,Entire home/apt,Areeiro,0,0.0,5,3.0,75.0


## Sorting values

Function [sort_values](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html) can be used to sort the DataFrame along a certain column.

For instance, let's sort df from cheapest to more expensive rooms:

In [26]:
df.sort_values('price')

Unnamed: 0_level_0,host_id,room_type,neighborhood,reviews,overall_satisfaction,accommodates,bedrooms,price
room_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
8422256,29862851,Private room,Alvalade,0,0.0,1,1.0,10.0
13116032,72951043,Shared room,Arroios,1,0.0,8,1.0,10.0
13342103,3518523,Entire home/apt,Santa Maria Maior,52,4.5,4,1.0,10.0
...,...,...,...,...,...,...,...,...
3091854,15724745,Entire home/apt,Campo de Ourique,0,0.0,6,2.0,3460.0
3067648,15610125,Entire home/apt,Benfica,0,0.0,7,3.0,4037.0
7800900,2214817,Entire home/apt,Misericórdia,6,5.0,4,1.0,7496.0
