<a href="https://colab.research.google.com/github/MonkeyWrenchGang/MGTPython/blob/main/module_3/3_1_crunching_w_pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Crunching Data w. Pandas


---

Pandas is our library for data manipulation and analysis. It provides a variety of tools for "crunching" or "aggregating" data.

The most common method for aggregating data in Pandas is the `.groupby()` and `.agg()` or `.aggregate()` functions (agg is short for aggregate). These function allows you to group your data by one or more columns, and then perform a variety of aggregation operations on the groups. Some common aggregation operations include:

- mean(): calculates the mean of each group
- sum(): calculates the sum of each group
- count(): counts the number of non-NA/null values in each group
- min(): finds the minimum value in each group
- max(): finds the maximum value in each group

The `agg()` function allows us to perform multiple aggregation operations at once.

To get started cruncing we'll first  load our data into a DataFrame, explore the dataframe, and then use the groupby and agg functions to analyze the data. 


---


In this tutorial we'll perform the following
1. import the data
2. analyze missing values 
  - by row
  - by column 
3. aggregate 
  - single column 
  - multiple columns 
4. groupby and aggregate 
  - single column groupby single aggregate 
  - single column groupby multiple aggregates 
  - multiple column groupby single & multiple aggregate 
5. combining with query()
  - filter rows 


## 0. Load Libraries

In [1]:
import datetime as dt
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

# 1. Import Data

San Diego California is a popular vacation destination known for its beautiful beaches and warm weather. As a result, the city has a thriving AirBNB market with a wide variety of properties available for rent. According to data from Inside AirBNB, a company that tracks AirBNB data, San Diego had over 12,000 active listings as of December 2022. These listings included everything from private rooms to entire homes and apartments. 

Here we are going to import the listing data from San Diego California AirBNBs. 

http://insideairbnb.com/san-diego/

```python
abnb = pd.read_csv("https://raw.githubusercontent.com/MonkeyWrenchGang/MGTPython/main/module_3/data/sd_listings.csv")
abnb.head()
```


## Our Challenge 

The San Diego AirBNB market is a strong and profitable option for both hosts and guests looking for a comfortable and convenient way to experience the city. We have several questions to answer:

0. How many nulls are their in each **column** and how many nulls are their in each **row**?
1. what is the average daily rate for a San Diego AirBNB?
  - what is the average for San Diego
  - what is the count, min, max, mean, median for San Diego.
2. what neighbourhoods have the highest average prices?
3. What neighborhoods have the most  properties ie. count sorted ?
4.  What is the mean/min/max price and count of properties by neighbourhood where room_type == "Private room"  
5. Suppose we want to properties in Mission Bay,where room_type == Private room that is more than the mean price of private rooms in missiong bay?
  - change city for La Jolla
  - change city for Pacific Beach 



 




## Info
The `info()` method is used to display a summary of the DataFrame, including the number of rows, number of columns, name of columns, non-null values, and data types of the columns.

this is one way we can get the number of non-nulls by column, but there is an even better way. 


```python
abnb.info()
```

# 0. Count Nulls


---

How many nulls are their in each **column** and how many nulls are their in each **row**? Using the Pandas library, count the number of null values in a DataFrame by using the `isnull()` method followed by the `sum(axis=0 or axis=1)` method.

1. produce a result of nulls by column - are their columns that contain 100% null values? 
2. create a new column "null_row_count" - which is the count of the number of nulls for each row. For example if neighborhood and price were both null in a row the count woudl be 2. Then filter for rows with > 4 null. 

## What is the axis = parameter?: 
In Pandas, the **axis = parameter** is used to specify which axis you want to apply a certain operation to.

- `axis = 0` refers to the rows of the DataFrame and is the default value for most operations. When using axis=0, the operation is applied vertically to each column for each row. This will give you the null counts by column. 

- `axis = 1` refers to the columns of the DataFrame. When using axis=1, the operation is applied horizontally to each row for each column. This will give you the null counts for each row of data. 


Here's an example:

```python
# count null columns
df.isnull().sum(axis=0)

# count null rows
df.isnull().sum(axis=1)

# create a column "null_row_count" wich contains the number of nulls found in a row of data. 
abnb["null_row_count"] = abnb.isnull().sum(axis=1)
abnb.sort_values("null_row_count", ascending=False).query('null_row_count > 4')
```


# 1. What is the average daily rate(price) for a San Diego AirBNB? 


---

- calculate the mean of price simply using `.mean()` - this operates on a series object (single column) 
- calculate the min,max,mean,median,and count using `.aggregate()` - this operates on a dataframe object 

[single-bracket] RETURNS A SERIES.

```python
mean_price = abnb["price"].mean()
print("the mean price is {:.2f}".format(mean_price))
```



### Using .aggregate()

```python
abnb["price"].aggregate(['mean', 'median', 'min', 'max', 'count'])
```

# 2. what neighbourhoods have the highest average prices? 

Lets break this down: 
  - first we need to summarize by neighborhood 
  - now it gets a little tricky, we need to sort by mean price but we now have a multi-level-index. which makes referencing the column difficult. 
    - use 'sort_values(, ascending=False)' to sort largest to smallest, because we now have a multi-level-index(see below)
    - to reference a multi-level-index we use parenthesis ( "first level", "second level") for example:

  ```python
  data = {'Name': ['John', 'Jane', 'Sam', 'John', 'Jane'],
        'Age': [32, 28, 45, 32, 28],
        'Salary': [50000, 60000, 70000, 50000, 60000],
        'Department': ['IT', 'HR', 'Finance', 'IT', 'HR']
       }
  df = pd.DataFrame(data)

  ```

  ```python
  grouped_df = df.groupby(['Department']).agg({"Salary":["mean","min","max"]})
  print(grouped_df.columns)
  ```

- This will produce a set of columns that looks like this:

  ```text
MultiIndex([('Salary', 'mean'),
            ('Salary',  'min'),
            ('Salary',  'max')],
           )
  ```

- which you need to reference like this if you want the mean of salary: 
  
  ```python
  grouped_df('Salary', 'mean')

  ```


---



## About: GroupBy() and Aggregate():
The `groupby()` method is used to ***group rows*** of a DataFrame based on **one or more columns**. Once the DataFrame is grouped, you can apply various aggregate functions to the groups, such as sum, mean, count, etc.

The `agg()` function is used to perform the aggregate functions after groupby. It can take multiple aggregate functions and apply on different columns.

NOTE: A MultiIndex, or Multi-level Index, in pandas is a way to represent and manipulate higher-dimensional data in a DataFrame or Series. It allows you to have multiple levels of indexing on an axis, rather than just a single level of indexing as in a regular DataFrame

for example: 

```python
# single column selected, then aggregate.
abnb.groupby("neighbourhood")["price"].aggregate(['mean', 'median', 'min', 'max', 'count'])

# apply aggregate to one or more numeric columns. 
abnb.groupby("neighbourhood").aggregate({ "price":
    ['mean', 'median', 'min', 'max', 'count']})

```

to sort by the multi-level-index we need to reference the first and second level of this aggregate 

```python
abnb.groupby("neighbourhood").aggregate({ "price":
    ['mean', 'median', 'min', 'max', 'count']}).sort_values(("price","mean"), ascending=False)
```


---


### 2. what neighbourhoods have the highest average prices? 
- first we need to summarize by neighborhood 
- we need to sort by mean price using 'sort_values(, ascending=False)' and sort largest to smallest.
    

## 2.1 Suppose i want to reference a column where a multi-level-index is present?

1. look at the columns `res1.columns` this will give you the column name
2. use [(level1,level2 )] to reference the column and return a series. 
3. use bracket-bracket [[(level1,level2 )]] to reference the column and return a dataframe  

here's an example:

```python
# create multi-level-index
res2 = abnb.groupby("neighbourhood").aggregate({ "price":
    ['mean', 'median', 'min', 'max', 'count']}).sort_values(("price","mean"), ascending=False)

# print columns 
print(res2.columns)

# returns a series
res2[('price','mean')]

# returns a dataframe 
res2[[('price','mean')]]

# returns a dataframe w. all the aggregates 
res2[["price"]]
```

# 3. What neighborhoods have the most  properties ?

- here we need to count, by neighborhood
- what are we going to count? let's count the "id" column
- we'll need to sort ascending (largest to smallest) again by a multi-index 

QUESTION: can't i just use value_counts()? Of course that's the easiest! 

```python
# easy
abnb['neighbourhood'].value_counts()
# using groupby and .agg
abnb.groupby('neighbourhood').agg({"id":["count"]}).sort_values(("id","count"),ascending=False)

```





# 4. What is the mean/min/max price and count of properties by neighbourhood where room_type == "Private room"  


---


Filter for room_type == "Private room" what is the count of properties, and the min/max/mean price  by neighbourhood? sorted by mean price decending (largest to smallest) 

- start by filtering for room_type == "Private room"
- group by neighbourhood
- aggregate count id, and get the price min/max/mean 
- sort by mean price ascending = False 

```python
# straight forward
abnb.query('room_type == "Private room"').groupby("neighbourhood").agg({
    "id":"count",
    "price":["min","max","mean"]
}).sort_values(('price','mean'),ascending=False)

# alternative 
res4 = abnb.query('room_type == "Private room"').groupby("neighbourhood").agg({
    "id":"count",
    "price":["min","max","mean","median"]
}).sort_values(('price','median'),ascending=False).reset_index()

res4
```


# 5. Suppose we want to properties in  Mission Bay,where room_type == Private room that is more than the mean price of private rooms in Mission Bay? 

here we have a two step process(though you could probably do it in one) 

- figure out mean price of Private rooms in Mission Bay
- query for Mission Bay Private rooms where price > mean_mission_bay_price 




---
Passing values to a query: 

In the `df.query()` method, the `@` symbol is used to pass a variable to the query string. For example, if you have a variable "x" that you want to use in your query, you would pass it like this: `df.query("column_name > @x")`. This will replace the "@x" in the query string with the value of the variable x when the query is executed. This allows you to use **dynamic values in your queries**, rather than hardcoding them into the query string.

```python
# figure out mean price
mean_price = abnb.query('neighbourhood == "Mission Bay" and room_type == "Private room"')["price"].mean()
mean_price

# query for properties > mean price
abnb.query('neighbourhood == "Mission Bay" and room_type == "Private room" and price > @mean_price')[["name","room_type","price","minimum_nights"]]

```


Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,license,null_row_count
3606,29796733,PB LIFE,18019435,Marcelo,,Mission Bay,32.79641,-117.22808,Private room,10000,2,1,2019-07-07,0.02,1,0,0,,2
6440,47475849,"WELCOME to ""WORLD MAP SUITE"" in the heart of P...",380705136,Duda,,Mission Bay,32.7919,-117.23807,Private room,100000,1,1,2021-01-09,0.04,1,0,0,,2


## 5a. Change to La Jolla

```python
city = "La Jolla"
mean_price = abnb.query('neighbourhood == @city and room_type == "Private room"')["price"].mean()
print("The mean price in {} is {:.2f}".format(city, mean_price))
print("-------")
abnb.query('neighbourhood == @city and room_type == "Private room" and price > @mean_price')[["name","room_type","price","minimum_nights"]]
```

# 5b. Change to Pacific Beach
repeat but change the neighbourhood to Pacific Beach and minimum_nights == 1

# 6. General Aggregation Stuff

The following are some examples of aggregations - take a few minutes and work through some of them. 

In [45]:
abnb.describe()

Unnamed: 0,id,host_id,neighbourhood_group,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,null_row_count
count,12781.0,12781.0,0.0,12781.0,12781.0,12781.0,12781.0,12781.0,10913.0,12781.0,12781.0,12781.0,12781.0
mean,2.062873e+17,142762200.0,,32.768311,-117.182792,328.618653,6.998435,55.830843,1.950384,16.333229,187.809561,16.518582,2.166184
std,3.125871e+17,149289300.0,,0.064726,0.064342,1017.10205,19.87715,91.635858,1.799772,34.104763,129.845985,20.978197,0.791634
min,6.0,29.0,,32.54076,-117.28171,0.0,1.0,0.0,0.01,1.0,0.0,0.0,1.0
25%,25452340.0,22737940.0,,32.72633,-117.24841,119.0,1.0,3.0,0.51,1.0,73.0,1.0,2.0
50%,47033440.0,81541740.0,,32.75881,-117.17071,193.0,2.0,17.0,1.42,3.0,176.0,8.0,2.0
75%,6.016727e+17,234239200.0,,32.79684,-117.14231,357.0,4.0,66.0,2.93,14.0,324.0,26.0,2.0
max,7.876679e+17,492023800.0,,33.10179,-116.93424,100000.0,999.0,966.0,14.14,200.0,365.0,219.0,5.0


#6.1 Suppose you wanted to only aggreate numeric data.

In the pandas library, the `select_dtypes()` method is used to filter a DataFrame by selecting columns with a specific data type. The method takes one or more data types as an argument, and returns a new DataFrame that contains only the columns of the original DataFrame that have a matching data type.

For example, if you have a DataFrame df and you want to select all columns that have a data type of float, you would use the following code:

```python
# only floats
float_columns = df.select_dtypes(include='float')

# or multiple types 
columns = df.select_dtypes(include=['float', 'int'])


```
You can also use the exclude parameter to exclude the columns of a certain dtype.
```python
no_float_columns = df.select_dtypes(exclude='float')

```

Suppose you want to get the mean, min,max, nunique of floats and integers? 



In [144]:
abnb.select_dtypes(include= ["float64", 'int64']).agg( ['mean','min','max', 'nunique'] )

Unnamed: 0,id,host_id,neighbourhood_group,latitude,longitude,price,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365,number_of_reviews_ltm,null_row_count
mean,2.062873e+17,142762200.0,,32.768311,-117.182792,328.618653,6.998435,55.830843,1.950384,16.333229,187.809561,16.518582,2.166184
min,6.0,29.0,,32.54076,-117.28171,0.0,1.0,0.0,0.01,1.0,0.0,0.0,1.0
max,7.876679e+17,492023800.0,,33.10179,-116.93424,100000.0,999.0,966.0,14.14,200.0,365.0,219.0,5.0
nunique,12781.0,5972.0,0.0,9114.0,8686.0,1250.0,54.0,508.0,821.0,52.0,366.0,129.0,5.0


#6b. Suppose you want to only select "minimum_nights", "price", "number_of_reviews" and get the mean,min,max of those columns?

You would simply Select with double brackets [[ "minimum_nights", "price", "number_of_reviews"]]


In [145]:
abnb[["minimum_nights", "price", "number_of_reviews" ]].agg( ['mean','min','max'] )

Unnamed: 0,minimum_nights,price,number_of_reviews
mean,6.998435,328.618653,55.830843
min,1.0,0.0,0.0
max,999.0,100000.0,966.0


#6.c Filtering for Neighbourhood in "La Jolla" and "Pacific Beach" for  private rooms group by  Neighbourhood and Aggregate the following columns: 
  - price : ["min", "max", "mean"]
  - number_of_reviews : ["sum","min","max","mean"]
  - id : ["count"]
  
Use a dictionary approach to aggregate .


---
Think about what we'll need to do
1. filter for neighbourhood == ["La Jolla", "Pacific Beach"]
2. and room_type == "Private room"
3. group by neighbourhood
4. aggreagte ({ "price" : ["min", "max", "mean"]
  "number_of_reviews" : ["sum","min","max","mean"]
  "id": ["count"]})


```python
# one long code
abnb.query('neighbourhood == ["La Jolla", "Pacific Beach"] and room_type == "Private room"').groupby("neighbourhood").agg({ "price" : ["min", "max", "mean"],
  "number_of_reviews" : ["sum","min","max","mean"],
  "id": ["count"]})

# break it up
abnb.query('neighbourhood == ["La Jolla", "Pacific Beach"] and room_type == "Private room"')\
.groupby("neighbourhood")\
.agg({ "price" : ["min", "max", "mean"],
  "number_of_reviews" : ["sum","min","max","mean"],
  "id": ["count"]})


```

In [148]:
# break it up
abnb\
.query('neighbourhood == ["La Jolla", "Pacific Beach"] and room_type == "Private room"')\
.groupby("neighbourhood")\
.agg({ "price" : ["min", 
                  "max", "mean"],
      "number_of_reviews" : ["sum","min","max","mean"],
      "id": ["count"]})

Unnamed: 0_level_0,price,price,price,number_of_reviews,number_of_reviews,number_of_reviews,number_of_reviews,id
Unnamed: 0_level_1,min,max,mean,sum,min,max,mean,count
neighbourhood,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
La Jolla,47,647,173.566667,5132,0,355,85.533333,60
Pacific Beach,30,1250,140.464646,4661,0,346,47.080808,99


In [149]:
# you can also pass instructions in.
my_agg_dict = {"minimum_nights": ['mean','min','max'], 
           "price" : ['mean','min','max'], 
           "number_of_reviews" : ["mean", "median", "count"]}

abnb.agg(my_agg_dict)       

Unnamed: 0,minimum_nights,price,number_of_reviews
mean,6.998435,328.618653,55.830843
min,1.0,0.0,
max,999.0,100000.0,
median,,,17.0
count,,,12781.0


# 6.d Group By Aggregation

As simple as it gets

In [151]:
abnb.groupby(["room_type"])[["price"]].agg( ['mean','min', 'max'] )

Unnamed: 0_level_0,price,price,price
Unnamed: 0_level_1,mean,min,max
room_type,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
Entire home/apt,352.064804,10,10000
Hotel room,80.75,0,210
Private room,212.881726,17,100000
Shared room,219.24359,20,9999


## 6.e fancier with agg practice

In [54]:
abnb.groupby(["room_type"]).agg( {"minimum_nights": ['mean','min','max'], 
           "price" : ['mean','min','max'], 
           "number_of_reviews" : ["mean", "median", "count"]} )

Unnamed: 0_level_0,minimum_nights,minimum_nights,minimum_nights,price,price,price,number_of_reviews,number_of_reviews,number_of_reviews
Unnamed: 0_level_1,mean,min,max,mean,min,max,mean,median,count
room_type,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2
Entire home/apt,6.837472,1,365,352.064804,10,10000,56.997366,19.0,10632
Hotel room,1.375,1,2,80.75,0,210,50.625,37.0,8
Private room,7.85555,1,999,212.881726,17,100000,49.869123,12.0,2063
Shared room,6.846154,1,300,219.24359,20,9999,55.038462,14.0,78


## Stack() and Reset Index


---

Dealing with multi-level indexing 





In [55]:
abnb.groupby(["room_type"]).agg( {"minimum_nights": ['mean','min','max'], 
           "price" : ['mean','min','max'], 
           "number_of_reviews" : ["mean", 'min','max']} ).stack().reset_index()

Unnamed: 0,room_type,level_1,minimum_nights,price,number_of_reviews
0,Entire home/apt,mean,6.837472,352.064804,56.997366
1,Entire home/apt,min,1.0,10.0,0.0
2,Entire home/apt,max,365.0,10000.0,909.0
3,Hotel room,mean,1.375,80.75,50.625
4,Hotel room,min,1.0,0.0,0.0
5,Hotel room,max,2.0,210.0,180.0
6,Private room,mean,7.85555,212.881726,49.869123
7,Private room,min,1.0,17.0,0.0
8,Private room,max,999.0,100000.0,966.0
9,Shared room,mean,6.846154,219.24359,55.038462


In [56]:
abnb.groupby(["room_type"]).agg( {"minimum_nights": ['mean','min','max'], 
           "price" : ['mean','min','max'], 
           "number_of_reviews" : ['mean','min','max']} ).stack().reset_index()

Unnamed: 0,room_type,level_1,minimum_nights,price,number_of_reviews
0,Entire home/apt,mean,6.837472,352.064804,56.997366
1,Entire home/apt,min,1.0,10.0,0.0
2,Entire home/apt,max,365.0,10000.0,909.0
3,Hotel room,mean,1.375,80.75,50.625
4,Hotel room,min,1.0,0.0,0.0
5,Hotel room,max,2.0,210.0,180.0
6,Private room,mean,7.85555,212.881726,49.869123
7,Private room,min,1.0,17.0,0.0
8,Private room,max,999.0,100000.0,966.0
9,Shared room,mean,6.846154,219.24359,55.038462


## 6.f Multi-Level Group by 


---



In [57]:
abnb.groupby(["neighbourhood","room_type"]).agg( {"minimum_nights": ['mean','min','max'], 
           "price" : ['mean','min','max'], 
           "number_of_reviews" : ['mean','min','max']} )

Unnamed: 0_level_0,Unnamed: 1_level_0,minimum_nights,minimum_nights,minimum_nights,price,price,price,number_of_reviews,number_of_reviews,number_of_reviews
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,min,max,mean,min,max,mean,min,max
neighbourhood,room_type,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
Allied Gardens,Entire home/apt,10.428571,1,30,359.785714,81,1112,32.785714,0,258
Allied Gardens,Private room,6.000000,1,29,79.500000,56,161,57.333333,0,210
Alta Vista,Entire home/apt,2.200000,1,5,237.000000,112,393,36.800000,0,157
Alta Vista,Private room,1.250000,1,2,79.250000,41,180,137.000000,0,466
Amphitheater And Water Park,Private room,2.500000,1,5,69.250000,55,86,33.250000,6,71
...,...,...,...,...,...,...,...,...,...,...
West University Heights,Private room,3.233333,1,30,385.900000,36,7181,51.866667,0,323
Wooded Area,Entire home/apt,7.907407,1,31,514.148148,80,2000,40.351852,0,250
Wooded Area,Private room,1.500000,1,2,129.500000,129,130,30.000000,1,59
Yosemite Dr,Entire home/apt,1.500000,1,2,80.500000,75,86,306.000000,236,376


## Filtering w. Group by and Aggregate

My recomendation is to filter rows up front using query

In [58]:
abnb.query('room_type == "Private room"').groupby(["neighbourhood","room_type"]).agg( {"minimum_nights": ['mean','min','max'], 
           "price" : ['mean','min','max'], 
           "number_of_reviews" : ['mean','min','max']} )

Unnamed: 0_level_0,Unnamed: 1_level_0,minimum_nights,minimum_nights,minimum_nights,price,price,price,number_of_reviews,number_of_reviews,number_of_reviews
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,min,max,mean,min,max,mean,min,max
neighbourhood,room_type,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
Allied Gardens,Private room,6.000000,1,29,79.500000,56,161,57.333333,0,210
Alta Vista,Private room,1.250000,1,2,79.250000,41,180,137.000000,0,466
Amphitheater And Water Park,Private room,2.500000,1,5,69.250000,55,86,33.250000,6,71
Balboa Park,Private room,7.285714,1,30,132.428571,50,800,70.257143,0,574
Bay Ho,Private room,6.823529,1,30,85.500000,41,245,58.147059,0,326
...,...,...,...,...,...,...,...,...,...,...
Valencia Park,Private room,3.000000,1,6,52.285714,32,68,30.857143,1,170
Webster,Private room,9.666667,2,14,52.333333,40,70,2.666667,1,4
West University Heights,Private room,3.233333,1,30,385.900000,36,7181,51.866667,0,323
Wooded Area,Private room,1.500000,1,2,129.500000,129,130,30.000000,1,59


# Method Chaining


---

Method chaining is a programming technique in which multiple method calls are chained together in a single statement, using the result of one method call as the input for the next. In Python, this is typically done by returning the object itself from each method, allowing the next method to be called on the same object. This can make the code more concise and easy to read.



In [93]:
## -- Method Chaining / pipelining style -- 

RES1 = ( abnb
        .query('room_type == "Private room"')
        .groupby(["neighbourhood","room_type"])
        .agg( {"minimum_nights": ['mean','min','max'], 
               "price" : ['mean','min','max'], 
               "number_of_reviews" : ['mean','min','max']} )
        .stack()
        .reset_index()
        #.query('neighbourhood == "Alta Vista" ')
)[["neighbourhood", "level_2", "minimum_nights"]]

RES1

Unnamed: 0,neighbourhood,level_2,minimum_nights
0,Allied Gardens,mean,6.00
1,Allied Gardens,min,1.00
2,Allied Gardens,max,29.00
3,Alta Vista,mean,1.25
4,Alta Vista,min,1.00
...,...,...,...
295,Wooded Area,min,1.00
296,Wooded Area,max,2.00
297,Yosemite Dr,mean,1.00
298,Yosemite Dr,min,1.00
