## Assign, Map, Query, and Explode:

### 1. Assign:
The first method we have a look at is the assign method. This method allows you to add columns to a DataFrame.

In [1]:

import pandas as pd
data = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv")

grouped = (
    data.groupby("species").agg(["mean"])
    .assign(
        fancy_column=lambda df: df["sepal_width"]["mean"]
        / df["sepal_width"]["mean"].mean(),
        useless_column="I am useless"
    )
)
grouped

Unnamed: 0_level_0,sepal_length,sepal_width,petal_length,petal_width,fancy_column,useless_column
Unnamed: 0_level_1,mean,mean,mean,mean,Unnamed: 5_level_1,Unnamed: 6_level_1
species,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
setosa,5.006,3.428,1.462,0.246,1.121239,I am useless
versicolor,5.936,2.77,4.26,1.326,0.906018,I am useless
virginica,6.588,2.974,5.552,2.026,0.972743,I am useless


We created two new columns using assign by passing the names of these columns as keyword arguments to the function and assigning them the values the resulting columns should hold.

As you can see, to assign the actual values, you have three different options.

You can use a scalar value, which sets all entries of the new column to that value.
You can use an array or series, which leads to this array being used as the column’s values. This array must be of the same length as the DataFrame the assign-method is invoked on.
You can use a function, or more general a callable that takes a DataFrame as its only input and returns a scalar or series. When a series is returned, it must be of the same length as the input DataFrame.

## 2. Map:
With map you can substitute every value in a Series with another value.

In [4]:
import pandas as pd
data = pd.read_csv(
    "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv"
).assign(
    to_big_to_small=lambda df: (df.sepal_width > 3).map({True: "Too Big", False: "Perfect"}),
    inverted_name=lambda df: df.species.map(lambda name: name[::-1]),
)
data

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species,to_big_to_small,inverted_name
0,5.1,3.5,1.4,0.2,setosa,Too Big,asotes
1,4.9,3.0,1.4,0.2,setosa,Perfect,asotes
2,4.7,3.2,1.3,0.2,setosa,Too Big,asotes
3,4.6,3.1,1.5,0.2,setosa,Too Big,asotes
4,5.0,3.6,1.4,0.2,setosa,Too Big,asotes
...,...,...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,virginica,Perfect,acinigriv
146,6.3,2.5,5.0,1.9,virginica,Perfect,acinigriv
147,6.5,3.0,5.2,2.0,virginica,Perfect,acinigriv
148,6.2,3.4,5.4,2.3,virginica,Too Big,acinigriv


In this code, we use map to conditionally set the entries of the to_big_to_small column to two different strings. Furthermore, we use map to add a new column inverted_name that holds the inverted name of each species.

## 3. Query:
function to extract rows is the query function

In [3]:
import pandas as pd
data = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv")
length_th = 0.5
filtered_data = (data
    .assign(**{"PW Squared": data["petal_width"] ** 2})
    .query("`PW Squared` > 0.4 and petal_length > @length_th and species != 'setosa'")
)
filtered_data

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species,PW Squared
50,7.0,3.2,4.7,1.4,versicolor,1.96
51,6.4,3.2,4.5,1.5,versicolor,2.25
52,6.9,3.1,4.9,1.5,versicolor,2.25
53,5.5,2.3,4.0,1.3,versicolor,1.69
54,6.5,2.8,4.6,1.5,versicolor,2.25
...,...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,virginica,5.29
146,6.3,2.5,5.0,1.9,virginica,3.61
147,6.5,3.0,5.2,2.0,virginica,4.00
148,6.2,3.4,5.4,2.3,virginica,5.29


We have selected a subset of all rows using a logical filter expression that we have passed to query . The filter expression is a string that can contain various logical comparisons like >,<,>=,<=,!=,==,and more to compare columns of your DataFrame. The queryfunction then evaluates that expression and returns all the rows where the expression is evaluated to True.

## 4. Explode:
The explode function is useful when entries of a column are list-like. Concretely, it enables you to create a new row per entry of these lists. When doing this, all other entries of a row will be replicated, also the index. You invoke it by passing the name of the column that contains the list-like objects. As always, let’s use an example to make that more tangible

In [2]:
import pandas as pd
n_rows = 3
result = pd.DataFrame(
    {"a": [list(range(1 + i ** 2)) for i in range(n_rows)], "b": list(range(n_rows))}
).explode("a").astype({'a':int})
result

Unnamed: 0,a,b
0,0,0
1,0,1
1,1,1
2,0,2
2,1,2
2,2,2
2,3,2
2,4,2


In the given example, we first create a DataFrame with 3 rows. The entries of column a initially are lists of lengths 1, 2, and 4 respectively. After exploding the DataFrame on the column a, the resulting DataFrame is of size 1+2+4 = 7.