<a href="https://colab.research.google.com/github/chonginbilly/Moringa_DS/blob/Moringa_python/lamdafunctions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<font color="green">*To start working on this notebook, or any other notebook that we will use in this course, we will need to save our own copy of it. We can do this by clicking File > Save a Copy in Drive. We will then be able to make edits to our own copy of this notebook.*</font>

---

# Lambda Functions

## Introduction

Lamda functions serve as handy tools for streamlining code and performing quick, one-off operations. Imagine them as short, on-the-fly, functions that don't require a formal name. If you need to write a more complicated function, you may still need to use the more formal `def` method whereas lambda functions provide a quick and concise way to write functions.

## Objectives

By the end of this lesson, you will be able to:

* Understand the syntax and usage of lambda functions.
* Apply lambda functions effectively in tasks such as filtering, data transformation and sorting operations.
* Integrate lambda functions seamlessly with built-in functions such as `map()` and `apply()`.
* Make informed decisions on when to use lambda functions in data science.

## Import libraries

In [None]:
import pandas as pd
import numpy as np

import os

## load the data

[This](https://drive.google.com/file/d/1GoRRGv5ntTjzKIPMhrunDipX2ACqbPFV/view?usp=sharing) dataset contains customer feedback for British Airways.

Columns:

* **OverallRating**: The overall rating given by the customer.
* **ReviewHeader**: The header or title of the customer's review.
* **Name**: The name of the customer providing the feedback.
* **Datetime**: The date and time when the feedback was posted.
* **VerifiedReview**: Indicates whether the review is verified or not.
* **ReviewBody**: The detailed body of the customer's review.
* **TypeOfTraveller**: The type of traveler (e.g., Business, Leisure).
* **SeatType**: Class of the traveler (e.g. Business, Economy).
* **Route**: The flight route taken by the customer.
* **DateFlown**: The date when the flight was taken.
* **SeatComfort**: Rating for seat comfort.
* **CabinStaffService**: Rating for cabin staff service.
* **GroundService**: Rating for ground service.
* **ValueForMoney**: Rating for the value for money.
* **Recommended**: Whether the customer recommends British Airways.
* **Aircraft**: The aircraft used for the flight.
* **Food&Beverages**: Rating for food and beverages.
* **InflightEntertainment**: Rating for inflight entertainment.
* **Wifi&Connectivity**: Rating for onboard wifi and connectivity.


In [None]:
from google.colab import drive

drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
data_path = '/content/drive/MyDrive/Product/Naivas Big Data /Data'

os.chdir(data_path)

In [None]:
# import the data
df = pd.read_csv('BA_AirlineReviews.csv', index_col=0)
df.head()

Unnamed: 0,OverallRating,ReviewHeader,Name,Datetime,VerifiedReview,ReviewBody,TypeOfTraveller,SeatType,Route,DateFlown,SeatComfort,CabinStaffService,GroundService,ValueForMoney,Recommended,Aircraft,Food&Beverages,InflightEntertainment,Wifi&Connectivity
0,1.0,"""Service level far worse then Ryanair""",L Keele,19th November 2023,True,4 Hours before takeoff we received a Mail stat...,Couple Leisure,Economy Class,London to Stuttgart,November 2023,1.0,1.0,1.0,1.0,no,,,,
1,3.0,"""do not upgrade members based on status""",Austin Jones,19th November 2023,True,I recently had a delay on British Airways from...,Business,Economy Class,Brussels to London,November 2023,2.0,3.0,1.0,2.0,no,A320,1.0,2.0,2.0
2,8.0,"""Flight was smooth and quick""",M A Collie,16th November 2023,False,"Boarded on time, but it took ages to get to th...",Couple Leisure,Business Class,London Heathrow to Dublin,November 2023,3.0,3.0,4.0,3.0,yes,A320,4.0,,
3,1.0,"""Absolutely hopeless airline""",Nigel Dean,16th November 2023,True,"5 days before the flight, we were advised by B...",Couple Leisure,Economy Class,London to Dublin,December 2022,3.0,3.0,1.0,1.0,no,,,,
4,1.0,"""Customer Service is non existent""",Gaylynne Simpson,14th November 2023,False,"We traveled to Lisbon for our dream vacation, ...",Couple Leisure,Economy Class,London to Lisbon,November 2023,1.0,1.0,1.0,1.0,no,,1.0,1.0,1.0


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3701 entries, 0 to 3700
Data columns (total 19 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   OverallRating          3696 non-null   float64
 1   ReviewHeader           3701 non-null   object 
 2   Name                   3701 non-null   object 
 3   Datetime               3701 non-null   object 
 4   VerifiedReview         3701 non-null   bool   
 5   ReviewBody             3701 non-null   object 
 6   TypeOfTraveller        2930 non-null   object 
 7   SeatType               3699 non-null   object 
 8   Route                  2926 non-null   object 
 9   DateFlown              2923 non-null   object 
 10  SeatComfort            3585 non-null   float64
 11  CabinStaffService      3574 non-null   float64
 12  GroundService          2855 non-null   float64
 13  ValueForMoney          3700 non-null   float64
 14  Recommended            3701 non-null   object 
 15  Airc

## What are Lambda functions?

Also known as **anonymous** or **inline** functions, they are concise, on-line functions defined without a formal name using the `lambda` keyword.

Syntax:

`lamda x: x + 1`

The syntax includes the `lambda` keyword, arguments, and an expression. Very similar to defining functions in general or when naming the iterable in `for` loop, the variable that we use when creating the `lambda` function argument does not matter.

Let's say you want to count the number of words in each review, this can be useful in Sentiment Analysis.

In [None]:
df['ReviewBody'].map(lambda x: len(x.split())).head()

0    125
1    206
2     54
3    255
4    172
Name: ReviewBody, dtype: int64

## Lambda Functions vs General functions

||||
|:-------:|:-------:|:-------:|
||lamda|general|
|definition|Defined using the keyword `lambda`|Normal functions are created using the `def` keyword|
|arguments| Can have any number of arguments| Can have any number of arguments|
|Expressions| Has only one expression| Can have any number of expressions and lines of code|
|General usage| used for one-line expressions| used for large blocks of code|


## Lambda IIFEs

IIFEs are **Immediately Invoked Function Expressions**. These functions are executed as soon as they are created and they require no explicit call to invoke the function.

Let's create IIFEs using the lambda functions that returns the square of a number:

In [None]:
# squares of number
squares = (lambda x: x**2)(10)
print(squares)

100


In [None]:
(lambda x, y: x*y)(5, 6)

30

## Lambda with `apply()`

If our goal is to count the number of words in the `ReviewBody` column within the dataframe and find a suitable way to apply this count, we can utilize the `apply()` function. By employing this function, we can apply a lambda function to each entry in the `ReviewBody` column, enabling us to create a new column named `WordCount`. This approach transforms the dataframe, providing a convenient way to analyze the textual information and the corresponding word counts.

In [None]:
# count number of words per review
df['WordCount'] = df['ReviewBody'].apply(lambda review: len(str(review).split()))

In [None]:
df["WordCount"].head()

0    125
1    206
2     54
3    255
4    172
Name: WordCount, dtype: int64

## Lambda with `filter()` function

Now, let's determine the count of passengers who received an overall rating below 7, classifying them as **detractors**. This can be accomplished utilizing the `filter()` function. The `filter()` function, when supplied with a lambda function and a Pandas series, applies the lambda function to the series, effectively filtering the data to include only the relevant entries.

This returns a sequence of `True` and `False`, which we are going to use to filter the data.

In [None]:
# filtering out the detractors
detractors_scores = list(filter(lambda rating: rating < 7, df['OverallRating']))

len(detractors_scores)

2373

## Conditional statements using Lambda functions

These functions also support conditional statements, such as `if..else`. Supposing we want to classify passengers into `detractors`, `passive` individuals, and `promoters` based on the `OverallRating` column using a Net Promoter Score (NPS) approach, we can do it as follows:

In [None]:
df['NPS_Category'] = df['OverallRating'].apply(lambda rating: 'Detractor' if rating < 7 else ('Passive' if 7 <= rating <= 8 else 'Promoter'))

In [None]:
df['NPS_Category'].value_counts()

Detractor    2373
Passive       683
Promoter      645
Name: NPS_Category, dtype: int64

## Lambda with the `sort_values()` function

When employing lambda expressions in conjunction with the `sort_values()` function, we gain a versatile approach to tailor the sorting procedure for particular columns within a dataframe. This capability becomes evident when we apply it to the `TypeOfTraveller` column in our dataset. By utilizing the lambda function as the key argument in `sort_values()`, we can finely control the sorting criteria.

In [None]:
sorted_df = df.sort_values(by='TypeOfTraveller', key=lambda x: x.str.lower())

sorted_df['TypeOfTraveller']


2032    Business
721     Business
722     Business
1886    Business
724     Business
          ...   
3696         NaN
3697         NaN
3698         NaN
3699         NaN
3700         NaN
Name: TypeOfTraveller, Length: 3701, dtype: object

 The lambda function lambda `x: x.str.lower()` is employed to sort the dataframe based on the lowercase values in the `TypeOfTraveller` column, ensuring a case-insensitive ordering.

## A general approach to writing [Data Transformation] Functions

Above, we've covered a lot of the syntax of lambda functions, but the thought process for writing these complex transformations was not transparent. Let's take a minute to discuss some approaches to tackling these problems.

## Experiment and solve for individual cases first

Before trying to write a function to apply to an entire series, it's typically easier to attempt to solve for an individual case. For example, if we're trying to determine the number of words in a review, we can try and do this for a single review first.

First, choose an example field that you'll be applying the function to.

In [None]:
example = df['ReviewBody'].iloc[0]

example

'4 Hours before takeoff we received a Mail stating a cryptic message that there are disruptions to be expected as there is a limit on how many planes can leave at the same time. So did the capacity of the Heathrow Airport really hit British Airways by surprise, 4h before departure? Anyhow - we took the one hour delay so what - but then we have been forced to check in our Hand luggage. I travel only with hand luggage to avoid waiting for the ultra slow processing of the checked in luggage. Overall 2h later at home than planed, with really no reason, just due to incompetent people. Service level far worse then Ryanair and triple the price. Really never again. Thanks for nothing.'

Then start writing the function for that example. For example, if we need to count the number of words, it's natural to first divide the review into words. A natural way to do this is with the `str.split()` method.



In [None]:
example.split()

['4',
 'Hours',
 'before',
 'takeoff',
 'we',
 'received',
 'a',
 'Mail',
 'stating',
 'a',
 'cryptic',
 'message',
 'that',
 'there',
 'are',
 'disruptions',
 'to',
 'be',
 'expected',
 'as',
 'there',
 'is',
 'a',
 'limit',
 'on',
 'how',
 'many',
 'planes',
 'can',
 'leave',
 'at',
 'the',
 'same',
 'time.',
 'So',
 'did',
 'the',
 'capacity',
 'of',
 'the',
 'Heathrow',
 'Airport',
 'really',
 'hit',
 'British',
 'Airways',
 'by',
 'surprise,',
 '4h',
 'before',
 'departure?',
 'Anyhow',
 '-',
 'we',
 'took',
 'the',
 'one',
 'hour',
 'delay',
 'so',
 'what',
 '-',
 'but',
 'then',
 'we',
 'have',
 'been',
 'forced',
 'to',
 'check',
 'in',
 'our',
 'Hand',
 'luggage.',
 'I',
 'travel',
 'only',
 'with',
 'hand',
 'luggage',
 'to',
 'avoid',
 'waiting',
 'for',
 'the',
 'ultra',
 'slow',
 'processing',
 'of',
 'the',
 'checked',
 'in',
 'luggage.',
 'Overall',
 '2h',
 'later',
 'at',
 'home',
 'than',
 'planed,',
 'with',
 'really',
 'no',
 'reason,',
 'just',
 'due',
 'to',
 'inco

Then we just need to count this!

In [None]:
len(example.split())

125

## Then return to solving for all!

In [None]:
df['ReviewBody'].map(lambda x: len(x.split())).head()

0    125
1    206
2     54
3    255
4    172
Name: ReviewBody, dtype: int64

## When to use lambda functions

In data science, we should judiciously apply lambda functions based on the specific context and requirements of our tasks. We use lambda functions when a concise and temporary function is needed, particularly for one-time use cases or short transformations.

We often turn to lambda functions when:

1. **Quick Transformations:** For rapid, on-the-fly data manipulations or transformations, lambda functions provide a convenient and succinct way to achieve the desired results without the need for a full-fledged named function.

2. **Anonymous Functions:** When the functionality is straightforward and doesn't warrant a separate function definition, lambda functions serve well as anonymous, inline functions. This can enhance code readability by keeping the logic close to where it's applied.

3. **Functional Arguments:** In scenarios where functions can accept other functions as arguments, such as with higher-order functions like `map()`, `filter()`, or `sort()`, lambda functions offer a concise way to specify the function logic without the overhead of a named function.

4. **Reducing Code Lines:** Lambda functions are effective in situations where brevity and conciseness are priorities. They can help reduce the number of lines in your code, making it more compact and easier to comprehend.

However, it's essential to exercise caution and avoid excessive use of lambda functions, especially in scenarios where the functionality may become complex or requires reuse. For more complex logic or when functions need to be reused in multiple places, it's often better to define a named function for clarity and maintainability. Striking the right balance between readability and brevity is crucial in making informed decisions about when to leverage lambda functions in data science tasks.


## Summary

Lambda functions prove valuable for swift, one-time data manipulations or brief transformations, offering a concise alternative to full-fledged named functions. They excel in scenarios requiring quick transformations, serving as anonymous, inline functions for straightforward functionality. Lambda functions are particularly handy when working with functional arguments in higher-order functions like `map()`, `filter()`, or `sort_values()`, providing a succinct way to express logic without the overhead of defining a named function. Their effectiveness lies in reducing code lines, enhancing code readability, and streamlining concise operations. However, caution is advised to avoid overuse, especially when dealing with complex functionality or the need for code reuse. In such cases, opting for named functions contributes to clarity and maintainability, ensuring a balanced approach between readability and brevity in data science tasks.