## Transformations

<br/>

### Map(): Single-Column Transformations

Pandas comes very handy when it comes to applying transformation rules to columns. The simplest method is to apply a `map()` function to transform values withina a column:

In [None]:
import pandas as pd

# read data fom csv
flights = pd.read_csv('../data/flights.csv', header=0)

def decode_airline(value:str):
    mapper = {
        'AA': 'American Airlines', 
        'AS': 'Alaska Airlines', 
        'DL': 'Delta Air Lines',
        'UA': 'United Airlines', 
        'WN': 'Southwest Airlines',
    }
    if value in mapper:
        return mapper[value]
    else:
        return 'Other'

# decode airline names and assign to a new column
flights['airline_name'] = flights.airline.map(decode_airline)

# print decoded flights
flights.loc[flights.airline_name != 'Other'][['airline', 'airline_name', 'src', 'dest']]


Let's practice more to get familiar with using `map()` effectively:

In [None]:
from datetime import datetime, date

def decode_flightdate(value):
    # check if value is already a date instance? parse as date if not
    if isinstance(value, date):
        return value
    else:
        return datetime.strptime(value, '%Y-%m-%d').date()

# re-assign flight_date as datetime
flights.flight_date = flights.flight_date.map(decode_flightdate)

# use lambda functions as map
flights.distance = flights.distance.map(lambda v: int(v))

print(flights.head(5))

### Apply(): Multi-Column Transformations

While the `.map()` method allows transformation over a single column, pandas Dataframe `.apply()` method allows transformtion over multiple column values. You can use `.apply()` when you need to transform more than one column within a row.

For example `encode_flight_key` mtehod concatenates airline, flight_number, src, and dest fields  to create a unique flight key for each row:

In [None]:
import pandas as pd

# read data fom csv
flights = pd.read_csv('../data/flights.csv', header=0)

def encode_flight_key(row):
    # a dataframe row is passed. access columns with row.column_name
    flight_key = f"{row.airline}{row.flight_number}-{row.src}-{row.dest}"
    return flight_key

# apply a function over entire row values
# set axis=1 to apply function over rows. axis=0 would apply over columns
flights['flight_key'] = flights.apply(encode_flight_key, axis=1)
flights['flight_key']

Pay attention to `axis=1` which directs pandas to apply the function horizontally over row values. `axis=0` directs pandas to apply a function vertically to all column values. Please refer to [DataFrame.apply](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html) documentation for more information.

Pandas passes the row values as the first parameter to the apply function. You can use the `args` parameter if your function requires more parameters. For example:

In [None]:
# passing more parameters to apply function by position
def encode_flight_key(row, key_type):
    # a dataframe row is passed. access columns with row.column_name
    if key_type == "short":
        flight_key = f"{row.airline}{row.flight_number}-{row.src}-{row.dest}"
    else:
        flight_key = f"{row.flight_date}-{row.airline}{row.flight_number}-{row.src}-{row.dest}"
    return flight_key

# apply a function over entire row values
# pass additional positional parameters to apply function
flights['flight_key'] = flights.apply(encode_flight_key, axis=1, args=("short",))
flights['flight_key_long'] = flights.apply(encode_flight_key, axis=1, args=("long",))

# print
flights[['flight_key', 'flight_key_long']]

### Complex

The section below shows an example where we apply a function over multiple columns which produces multiple columns in a Dataframe. 

In this example, we will produce two new columns called "is_commuter" and "is_long_distance" depending on flight's duration and distance.


In [None]:

def encode_flight_type(row):
    # commuter: distance less than 300 miles and flight time less than 90 mins
    # long distance: distance greater than 1500 miles and flight time over 3 hours
    is_commuter = row.distance < 300.0 and row.flight_time < 90.0
    is_long_distance = row.distance > 1500.0 and row.flight_time > 180.0
    # return a tuple
    return (is_commuter, is_long_distance)

# apply a function over row values and
# unpack multiple return column values by using zip()
flights['is_commuter'], flights['is_long_distance'] = zip(*flights.apply(
                                                        encode_flight_type, axis=1))

# print
flights.loc[flights.is_commuter == True]