# Lambda Functions - Lab

## Introduction

In this lab, you'll get some hands-on practice creating and using lambda functions.

## Objectives

In this lab you will:

* Create lambda functions to use as arguments of other functions   
* Use the `.map()` or `.apply()` method to apply a function to a pandas series or DataFrame

## Lambda Functions

In [1]:
#Importing pandas library
import pandas as pd
#Loading the csv file
df = pd.read_csv('Yelp_Reviews.csv', index_col=0)
#Previewing the first two rows
df.head(2)

Unnamed: 0,business_id,cool,date,funny,review_id,stars,text,useful,user_id
1,pomGBqfbxcqPv14c3XH-ZQ,0,2012-11-13,0,dDl8zu1vWPdKGihJrwQbpw,5,I love this place! My fiance And I go here atl...,0,msQe1u7Z_XuqjGoqhB0J5g
2,jtQARsP6P-LbkyjbO1qNGg,1,2014-10-23,1,LZp4UX5zK3e-c5ZGSeo3kA,1,Terrible. Dry corn bread. Rib tips were all fa...,3,msQe1u7Z_XuqjGoqhB0J5g


In [2]:
df.columns #Previewing the columns 

Index(['business_id', 'cool', 'date', 'funny', 'review_id', 'stars', 'text',
       'useful', 'user_id'],
      dtype='object')

In [3]:
len(df.columns) #Checking the number of columns we have #Before

9

In [4]:
df.shape #We have 2610 rows and 9 rows

(2610, 9)

In [5]:
df.info() #A summary of our data

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2610 entries, 1 to 4206
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   business_id  2610 non-null   object
 1   cool         2610 non-null   int64 
 2   date         2610 non-null   object
 3   funny        2610 non-null   int64 
 4   review_id    2610 non-null   object
 5   stars        2610 non-null   int64 
 6   text         2610 non-null   object
 7   useful       2610 non-null   int64 
 8   user_id      2610 non-null   object
dtypes: int64(4), object(5)
memory usage: 203.9+ KB


In [6]:
df.describe() #The statistical summary of our DataFrame

Unnamed: 0,cool,funny,stars,useful
count,2610.0,2610.0,2610.0,2610.0
mean,0.229119,0.211877,3.724521,0.872414
std,0.713175,0.793281,1.52467,1.862308
min,0.0,0.0,1.0,0.0
25%,0.0,0.0,3.0,0.0
50%,0.0,0.0,4.0,0.0
75%,0.0,0.0,5.0,1.0
max,14.0,16.0,5.0,24.0


## Simple arithmetic

Use a lambda function to create a new column called `'stars_squared'` by squaring the stars column.

In [7]:
# Your code here
df['stars_squared'] = df['stars'].map(lambda x: x**2)
df['stars_squared']

1       25
2        1
4       25
5        1
10      25
        ..
689     25
4874    25
564     25
3458     4
4206     1
Name: stars_squared, Length: 2610, dtype: int64

In [8]:
df['stars'] #Calling the 'stars' columns

1       5
2       1
4       5
5       1
10      5
       ..
689     5
4874    5
564     5
3458    2
4206    1
Name: stars, Length: 2610, dtype: int64

In [9]:
df.columns #After adding the stars_squared

Index(['business_id', 'cool', 'date', 'funny', 'review_id', 'stars', 'text',
       'useful', 'user_id', 'stars_squared'],
      dtype='object')

In [10]:
len(df.columns) #After adding the 'stars_squared' columns 

10

In [11]:
df.shape #We now have 2610 rows and 10 columns

(2610, 10)

In [12]:
type(df) #Checking our datatype

pandas.core.frame.DataFrame

## Dates
Select the month from the date string using a lambda function.

In [13]:
df['date']

1       2012-11-13
2       2014-10-23
4       2014-09-05
5       2011-02-25
10      2016-06-15
           ...    
689     2013-06-02
4874    2016-08-14
564     2016-06-14
3458    2013-10-02
4206    2016-08-15
Name: date, Length: 2610, dtype: object

In [14]:
#This is the code you will need to preview everything text in a column
pd.set_option('display.max_colwidth', None)

In [15]:
# Your code here
#Use the map fuction to iterate through the ['date'] column and selecting the month
df['date'].map(lambda x: x[5:7]).head()
#2013-06-02 From this we use indexing to find the month which is in the middle
#head() preview the 5 rows

1     11
2     10
4     09
5     02
10    06
Name: date, dtype: object

In [16]:
len(df['date'])

2610

## What is the average number of words for a yelp review?
Do this with a single line of code.

In [17]:
# Your code here
df['text'].map(lambda x: len(x.split())).mean() #Finding the average/mean

77.06551724137931

## Create a new column for the number of words in the review

In [18]:
# Your code here
#Creating a new colunm called ['Number_of_words']
df['Number_of_words'] = df['text'].map(lambda x: len(x.split()))
df['Number_of_words']

1        58
2        30
4        30
5        82
10       32
       ... 
689      61
4874     43
564      79
3458    185
4206     42
Name: Number_of_words, Length: 2610, dtype: int64

In [19]:
df.columns #Checking if our added column is updated

Index(['business_id', 'cool', 'date', 'funny', 'review_id', 'stars', 'text',
       'useful', 'user_id', 'stars_squared', 'Number_of_words'],
      dtype='object')

In [20]:
len(df.columns)

11

## Rewrite the following as a lambda function

Create a new column `'Review_Length'` by applying this lambda function to the `'Review_num_words'` column. 

In [22]:
# Rewrite the following function as a lambda function
def rewrite_as_lambda(value):
    if len(value) < 50:
        return 'Short'
    elif len(value) < 80:
        return 'Medium'
    else:
        return 'Long'
# Hint: nest your if, else conditionals

df['Review_length'] = df['Number_of_words'].map(lambda value: 'Short' if value < 50 else 'Medium' if value < 80 else 'Long')




In [23]:
df['Review_length']

1       Medium
2        Short
4        Short
5         Long
10       Short
         ...  
689     Medium
4874     Short
564     Medium
3458      Long
4206     Short
Name: Review_length, Length: 2610, dtype: object

## Level Up: Dates Advanced
<img src="images/world_map.png" width="600">  

Print the first five rows of the `'date'` column. 

In [25]:
# Your code here
#Previewing the first 5 rows using the head() function
df['date'].head()

1     2012-11-13
2     2014-10-23
4     2014-09-05
5     2011-02-25
10    2016-06-15
Name: date, dtype: object

In [26]:
df.date[:5]

1     2012-11-13
2     2014-10-23
4     2014-09-05
5     2011-02-25
10    2016-06-15
Name: date, dtype: object

Overwrite the `'date'` column by reordering the month and day from `YYYY-MM-DD` to `DD-MM-YYYY`. Try to do this using a lambda function.

In [32]:
# Your code here

date = pd.to_datetime(df.date) #This converts the date column from the DataFrame df to a datetime object

date.map(lambda x: x.strftime("%d-%m-%Y")) 
#The map() function is being used here to apply a transformation to each value in the date Series.
#The transformation is done using a lambda function, which is an anonymous function (a shorthand function definition).
#The lambda function is calling strftime("%d-%m-%Y") on each date (x).Converting each datetime object into a string 


1       13-11-2012
2       23-10-2014
4       05-09-2014
5       25-02-2011
10      15-06-2016
           ...    
689     02-06-2013
4874    14-08-2016
564     14-06-2016
3458    02-10-2013
4206    15-08-2016
Name: date, Length: 2610, dtype: object

## Summary

Hopefully, you're getting the hang of lambda functions now! It's important not to overuse them - it will often make more sense to define a function so that it's reusable elsewhere. But whenever you need to quickly apply some simple processing to a collection of data you have a new technique that will help you to do just that. It'll also be useful if you're reading someone else's code that happens to use lambdas.