## Four Functions of Pandas to Level up  Pandas Skills

In [1]:
import pandas as pd

In [2]:
import pandas as pd
data = pd.read_csv("demofile.csv")

data.columns

Index(['name', 'EmpID', 'EmpStatusID', 'DeptID', 'PerfScoreID', 'Salary',
       'Termd', 'PositionID', 'Position', 'Hiring location', 'Zip', 'DOB',
       'Sex', 'MaritalDesc', 'Citizenship', 'DateofJoining',
       'DateofTermination', 'TermReason', 'EmploymentStatus', 'Training hours',
       'Trainings', 'Department', 'ManagerName', 'Recruitment Source',
       'PerformanceScore', 'EngagementSurvey', 'Zip.1',
       'LastPerformanceReview_Date', 'DaysLateLast30', 'Absences', 'EmpType',
       'OTHours'],
      dtype='object')

## Assign

This method allows you to add columns to a DataFrame.

In [5]:
grouped = (
    data.groupby("DeptID")
    .agg(["mean"])
    .assign(
        fancy_column=lambda df: df["PositionID"]["mean"]
        / df["PositionID"]["mean"].mean(),
        useless_column="I am useless"
    )
)

  data.groupby("DeptID")


we can create multiple columns with a single assign call. Therefore, we just have to pass multiple keyword arguments; one for each column to be created.

## Map

Another handy function that can be invoked on Pandas Series objects is the map function. With map you can substitute every value in a Series with another value. 

In [9]:
import pandas as pd
data = pd.read_csv("demofile.csv"
).assign(
    to_big_to_small=lambda df: (df.PositionID > 3).map({True: "Too Big", False: "Perfect"}),
    inverted_name=lambda df: df.Position.map(lambda name: name[::-1]),
)

## Query

After looking into functions to manipulate data containers, let's have a look at how we can extract data from DataFrames. One useful function to extract rows is the query function.

In [12]:
import pandas as pd
data = pd.read_csv("demofile.csv")
length_th = 0.5
filtered_data = (data
    .assign(**{"PW Squared": data["PositionID"] ** 2})
    .query("`PW Squared` > 0.4 and PositionID > @length_th and DeptID != 'setosa'")
)

In the above code we have selected a subset of all rows using a logical filter expression that we have passed to query . The filter expression is a string that can contain various logical comparisons like >,<,>=,<=,!=,==,and more to compare columns of your DataFrame. The queryfunction then evaluates that expression and returns all the rows where the expression is evaluated to True.

## Explode

 The explodefunction is useful when entries of a column are list-like. Concretely, it enables you to create a new row per entry of these lists. When doing this, all other entries of a row will be replicated, also the index. You invoke it by passing the name of the column that contains the list-like objects.

explode doesn’t change the data type of the column it is applied on

In [13]:
import pandas as pd
n_rows = 3
result = pd.DataFrame(
    {"a": [list(range(1 + i ** 2)) for i in range(n_rows)], "b": list(range(n_rows))}
).explode("a").astype({'a':int})