# Lambda Functions - Lab

## Introduction

In this lab, you'll get some hands-on practice creating and using lambda functions.

## Objectives
In this lab you will: 
* Create lambda functions to use as arguments of other functions   
* Use the `.map()` or `.apply()` method to apply a function to a pandas series or DataFrame

## Lambda Functions

In [11]:
import pandas as pd
df = pd.read_csv('Yelp_Reviews.csv', index_col=0)
df.head(2)

Unnamed: 0,business_id,cool,date,funny,review_id,stars,text,useful,user_id
1,pomGBqfbxcqPv14c3XH-ZQ,0,2012-11-13,0,dDl8zu1vWPdKGihJrwQbpw,5,I love this place! My fiance And I go here atl...,0,msQe1u7Z_XuqjGoqhB0J5g
2,jtQARsP6P-LbkyjbO1qNGg,1,2014-10-23,1,LZp4UX5zK3e-c5ZGSeo3kA,1,Terrible. Dry corn bread. Rib tips were all fa...,3,msQe1u7Z_XuqjGoqhB0J5g


## Simple arithmetic

Use a lambda function to create a new column called `'stars_squared'` by squaring the stars column.

In [12]:
# Your code here
df['stars_squared']=df['stars'].map(lambda x:x**2)
df['stars_squared'].head(2)

1    25
2     1
Name: stars_squared, dtype: int64

## Dates
Select the month from the date string using a lambda function.

In [13]:
# Your code here
df['date'].map(lambda x:x[5:7]).head(2)

1    11
2    10
Name: date, dtype: object

## What is the average number of words for a yelp review?
Do this with a single line of code!

In [14]:
# Your code here
df['text'].map(lambda x: len(x.split())).mean()

77.06551724137931

## Create a new column for the number of words in the review

In [15]:
# Your code here
df['Review_total_words']=df['text'].map(lambda x: len(x.split()))
df['Review_total_words'].head(2)

1    58
2    30
Name: Review_total_words, dtype: int64

## Rewrite the following as a lambda function

Create a new column `'Review_Length'` by applying this lambda function to the `'Review_num_words'` column. 

In [16]:
# Rewrite the following function as a lambda function
def rewrite_as_lambda(value):
    if len(value) < 50:
        return 'Short'
    elif len(value) < 80:
        return 'Medium'
    else:
        return 'Long'
# Hint: nest your if, else conditionals

df['Review_length'] = df['Review_total_words'].map(lambda x: 'short' if x<50 else ('medium' if x<80 else 'long'))
print(df['Review_length'].head(2))
print('\n')
print(df['Review_length'].value_counts())
print('\n')
df['Review_length'].value_counts(normalize=True)

1    medium
2     short
Name: Review_length, dtype: object


short     1287
long       769
medium     554
Name: Review_length, dtype: int64




short     0.493103
long      0.294636
medium    0.212261
Name: Review_length, dtype: float64

## Level Up: Dates Advanced!
<img src="images/world_map.png" width="600">  

Print the first five rows of the `'date'` column. 

In [17]:
# Your code here
print(df['date'].head())
df['date'].describe()

1     2012-11-13
2     2014-10-23
4     2014-09-05
5     2011-02-25
10    2016-06-15
Name: date, dtype: object


count           2610
unique          1330
top       2013-06-02
freq              17
Name: date, dtype: object

In [18]:
df['date']=pd.to_datetime(df['date'])

In [21]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2610 entries, 1 to 4206
Data columns (total 12 columns):
business_id           2610 non-null object
cool                  2610 non-null int64
date                  2610 non-null datetime64[ns]
funny                 2610 non-null int64
review_id             2610 non-null object
stars                 2610 non-null int64
text                  2610 non-null object
useful                2610 non-null int64
user_id               2610 non-null object
stars_squared         2610 non-null int64
Review_total_words    2610 non-null int64
Review_length         2610 non-null object
dtypes: datetime64[ns](1), int64(6), object(5)
memory usage: 265.1+ KB


Overwrite the `'date'` column by reordering the month and day from `YYYY-MM-DD` to `DD-MM-YYYY`. Try to do this using a lambda function.

`.dt` works on the whole string, and not individual element.

In [24]:
# Your code here
df['date_trial']=df['date'].map(lambda x: x.strftime('%d/%m/%y'))
df['date_trial']

1       13/11/12
2       23/10/14
4       05/09/14
5       25/02/11
10      15/06/16
          ...   
689     02/06/13
4874    14/08/16
564     14/06/16
3458    02/10/13
4206    15/08/16
Name: date_trial, Length: 2610, dtype: object

In [38]:
df['dates_normal']=df['date'].map(lambda x: f'{x[8:]}-{x[5:7]}-{x[0:4]}')
print(df['dates_normal'].head(2))
pd.to_datetime(df['dates_normal']).head()

1    13-11-2012
2    23-10-2014
Name: dates_normal, dtype: object


1    2012-11-13
2    2014-10-23
4    2014-05-09
5    2011-02-25
10   2016-06-15
Name: dates_normal, dtype: datetime64[ns]

## Summary

Great! Hopefully, you're getting the hang of lambda functions now! It's important not to overuse them - it will often make more sense to define a function so that it's reusable elsewhere. But whenever you need to quickly apply some simple processing to a collection of data you have a new technique that will help you to do just that. It'll also be useful if you're reading someone else's code that happens to use lambdas.