# Data Transformations
Throughout this tutorial we will be explaining the different methods for data transformation. Transforming data allows users to more easily perform analysis on the data. Additionally, the user will be able to specify which columns, rows,  and other specifications they want included in their dataset. We will be learning how to subset, slice, sort, drop rows with blank values, and  several other commands that will allow us to clean up the data set. 

# Lesson 1: Subsetting & Slicing
Before we can run some examples we must import the required packages

In [None]:
import pandas as pd
from dplython import diamonds 
import numpy as np

## Subsetting 1.1
Subsetting a Pandas data frame is the process of selecting a set of specific rows and columns from a given data frame.


## Slicing 1.2
This method extracts a specific portion from a list. To slice we need to first call the name of the list `a` then specify what values you want to call. When calling for the values we specify a starting position, ending position, and stride. For example, `a[]`

Starting with postion m
Up to but not including n
Negative indexing can also be used

# Lesson 2: Removing NA's
The command `dropna()` is used to remove all rows that contain a blank or NA value. Below is an example of a dataframe with NA values. We will be droping rows where at least one value is NA

In [None]:
# Example Data frame with NA values
batFamily = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman', 'Robin'],
                   "toy": [np.nan, 'Batmobile', 'Bullwhip', 'Birdarang'],
                   "born": [np.nan, pd.Timestamp("1915-04-17"), np.nan, pd.Timestamp("1940-03-06")]})
batFamily

In [None]:
# Dropping the rows with NA values
batFamily.dropna()

# Lesson 3: Sorting
The `sort()` method sorts the list ascending by default. You can also make a function to decide the sorting criteria(s). reverse

The `sort_values()` sorts data frame columns in an ascending or descending order based on the column of your choosing. 

In [None]:
sort(diamonds, decreasing = FALSE)

In [None]:
# Sorting the diamonds dataframe by carat in from least to greatest
diamonds.sort_values(by=['carat'])

# Lesson 4: Groupby
Most data operations are done on groups defined by variables. `group_by()` takes an existing tbl and converts it into a grouped tbl where operations are performed "by group". 

In [None]:
diamondCuts <- group_by(cut)

diamondCuts

# Lesson 5: Aggregate
Here we will be learning how to collapse a data frame. 
When using the aggregate() function, the by variables must be in a list. You can use other commands to do a similar process.

In [None]:
# Example
aggregate(diamonds, by=list(depth, price), FUN=mean)

# Lesson 6: Concat
While concatenating strings in R, we can use `paste()` the separator and number number of input strings.

We can also concatenate data frames using `rbind()`, but we must note that this command requires the data frames to have the same number of columns.

In [None]:
# paste() example
str1 = 'Hello'
str2 = 'World!'

string = paste(str1, str2)
string

In [None]:
# rbind() example
apple <- data.frame(a = c(0,1,2), b = c(3,4,5))
grape <- data.frame(a = c(9,10,11), b = c(12,13,14))

new <- rbind(apple, grape)
new