<a href="https://colab.research.google.com/github/alexayanar/colab/blob/master/Copy_of_apply_a_function_to_every_row_in_a_pandas_dataframe_QTM350.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
import pandas as pd

## Use `.apply` to send a column of every row to a function

You can use `.apply` to send a single column to a function. This is useful when cleaning up data - converting formats, altering values etc.

In [29]:
# What's our data look like?
df = pd.read_csv("https://raw.githubusercontent.com/jeremyallenjacobson/RealRootsReproduction/master/train-1-10-2.csv", header=None)
df.head(10)

Unnamed: 0,0,1,2,3
0,8,6,10,0
1,4,9,8,0
2,6,2,5,0
3,3,6,6,0
4,5,10,8,0
5,10,2,6,0
6,8,8,9,0
7,5,9,1,1
8,9,10,5,0
9,5,3,5,0


Below we define the discriminant function which takes as input `x=[a,b,c]` a list of coefficients of a polynomial
$$
ax^2+bx+c
$$
and returns its mathematical discriminant
 $$b^2-4ac$$

In [0]:
def discriminant(x):
    return x[1]**2 - 4*x[2]*x[0]

In [0]:
def degree(x):
  if x[0] != 0:
    return 2
  if x[1] != 0:
    return 1
  else: 
    return 0

In [9]:
x = [0,2,3]
print(degree(x))


1


Now let's apply this function, using the previous columns as input. 

### How do we pass in the entries of the columns as input?
We select the first three columns using `loc` and use *slicing* by passing to the column index `0:2`. We pass `:` to the row index as we want all rows.

In [31]:
df.loc[:,0:2]

Unnamed: 0,0,1,2
0,8,6,10
1,4,9,8
2,6,2,5
3,3,6,6
4,5,10,8
...,...,...,...
695,5,3,1
696,4,8,3
697,5,5,8
698,3,1,10


Now, we apply our function to these first three columns, and save the result in a new column called `Discriminant`. It is aptly named, as its value in any row is simply the value of the mathematical discriminant of the polynomial determined by the coefficients in that row.

In [0]:
df['Discriminant'] = df.loc[:,:2].apply(discriminant, axis=)

In [0]:
df['Degree'] = df.loc[:, :2].apply(degree, axis=1)

In [0]:
df.to_csv('degree_df.csv')

In [17]:
# Take a peek
df.head(10)

Unnamed: 0,0,1,2,3,Degree
0,8,6,10,0,2
1,4,9,8,0,2
2,6,2,5,0,2
3,3,6,6,0,2
4,5,10,8,0,2
5,10,2,6,0,2
6,8,8,9,0,2
7,5,9,1,1,2
8,9,10,5,0,2
9,5,3,5,0,2


So what does the column named `3` represent? It indicates, with a 0 or 1, whether or not the polynomial in that row has a real root or not. Recall, a root is a value $x$ for which
$$ ax^2+bx+c =0$$

Notice, the only appearance of 1 in the `3` column in the sample above occurs when the `Discriminant` is positive. Indeed, this is a property of the discriminant. It is positive if and only if there is a non-zero real root.

## Use `.apply` with `axis=1` to send every single row to a function

You can also send an **entire row at a time** instead of just a single column. Use this if you need to use **multiple columns to get a result**.

In [0]:
# Create a dataframe from a list of dictionaries
rectangles = [
    { 'height': 40, 'width': 10 },
    { 'height': 20, 'width': 9 },
    { 'height': 3.4, 'width': 4 }
]

rectangles_df = pd.DataFrame(rectangles)
rectangles_df

Unnamed: 0,height,width
0,40.0,10
1,20.0,9
2,3.4,4


In [0]:
# Use the height and width to calculate the area
def calculate_area(row):
    return row['height'] * row['width']

rectangles_df.apply(calculate_area, axis=1)

0    400.0
1    180.0
2     13.6
dtype: float64

In [0]:
# Use .apply to save the new column if we'd like
rectangles_df['area'] = rectangles_df.apply(calculate_area, axis=1)
rectangles_df

Unnamed: 0,height,width,area
0,40.0,10,400.0
1,20.0,9,180.0
2,3.4,4,13.6


To save the new dataframe as a csv, we use the command below.

In [0]:
rectangles_df.to_csv('area.csv')

Then, we can see that our new file appears.

In [37]:
!ls
!git config --global user.email 'alexa.yanar@emory.edu'
!git config --global user.name 'ayanar'
!git clone https://github.com/alexayanar/colab.git
%cd dsc2/
!git add degree_df.csv
!git commit -m 'test yourself week 8'
!git push origin master 

adult.csv	  adult_sample_with_header.csv	exercise1-alexa.sh  README.md
adult.data	  colab				header.csv
adult_sample.csv  degree_df.csv			hw11
fatal: destination path 'colab' already exists and is not an empty directory.
[Errno 2] No such file or directory: 'dsc2/'
/content/dsc2
On branch master
Your branch is ahead of 'origin/master' by 1 commit.
  (use "git push" to publish your local commits)

Untracked files:
	[31mcolab/[m

nothing added to commit but untracked files present
fatal: could not read Username for 'https://github.com': No such device or address


In [35]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


fatal: not a git repository (or any of the parent directories): .git
fatal: not a git repository (or any of the parent directories): .git
fatal: not a git repository (or any of the parent directories): .git


From here, if we were running this notebook in sagemaker, it would be easy to copy this file to our S3 bucket using shell commands. Below are the instructions for that.

Alternatively, you can use git. Git clone your repo to this notebook instance, then commit and push the file area.csv.

#### Copying a local file to S3 (only works if AWS CLI installed)

Indeed, if you use sagemaker the AWS CLI comes preinstalled, so there would be no need to authenticate as we already gave our Sagemaker instance an IAM role allowing it to access all S3 buckets.  

The following cp command copies a single file to a specified bucket, here named 'mybucket' and key:

In [0]:
!aws s3 cp area.csv s3://mybucket/area.csv

For more AWS CLI commands for working with S3, see the examples in the reference [here](https://docs.aws.amazon.com/cli/latest/reference/s3/index.html).