# Basic operations with data frame
---

## 1. Mathematical Operations

You can multiple, divide, subract, and add numeric data types.

- Let's say you want to multiple two columns 
```python
stats.BirthRate * stats.InternetUsers
#you can also store the result in new object
result = stats.BirthRate * stats.InternetUsers
#you can also do the same for subsets
stats.BirthRate[8:15] * stats.InternetUsers[8:15]
```
___ 

## 2. Adding a new column to your data frame 

### use `dataframe['nameofNewColumn'] =  '

```python
stats['MyCalc'] = stats.BirthRate * stats.InternetUsers
#print the output
stats.MyCalc.head()
#result
0    808.2516
1    207.9927
2    878.3135
3    736.5644
4    971.8720
Name: MyCalc, dtype: float64
```    

___
> Difference between R (vector-based) and Python (object-based)  
>- In R, you have recycling of vectors. 

> What does that mean?   
That allows you to generate the copies of vectors that is provided in order to achieve the same length as that of the rest.

>- In Python, you cannot do that. Below is the example  
You have stats, a data frame, that is of len(stats) = 195.    
Now let's create a new column, but this time its length will be 6 (not 195)
```python
stats['NewCol'] = [1,2,3,4,5,6]  
```
>**ValueError:** Length of values (6) does not match length of index (195)

>This would have worked in R because of recycling vectors.
___

## 3. How to remove a column from data frame - 

### Use `object_name.drop('column_name', 1)`

```python
stats.drop('MyCalc',1)  #use 1 when deleting a column
```

__Note:__ Your existing dataframe (stats) will still contain that column. 

Why?

**dataframe.drop() function creates a new object with column or row removed from it.**

solution?

Overwrite- stores the result in your same object

```python
stats = stats.drop('MyCalc',1)

#Now, when you print stats, the column will not be present
```
**In order to make changes in your existing dataframe --> use inplace = True**


```python
stats.drop('MyCalc',1, inplace=True)

#Now you don't need to over-write existing data-frame, changes will automatically be saved in the same dataframe.
```

Parameters
----------
- labels: String or list of strings referring row or column name.
- axis: int or string value, 0 ‘index’ for Rows and 1 ‘columns’ for Columns.
- index or columns: Single label or list. index or columns are an alternative to axis and cannot be used together.
- level: Used to specify level in case data frame is having multiple level index.
- **inplace: Makes changes in original Data Frame if True.**
- errors: Ignores error if any value from the list doesn’t exists and drops rest of the values when errors = ‘ignore’

Return type: Dataframe with dropped values

----------
**Tip - to get additional information about the function**  

Type function name in your cell and the parenthesis. When you are within parenthesis, press **shift + tab**. A instruction window will appear.
```



In [None]:
import pandas as pd
import os

os.getcwd()
stats = pd.read_csv('demographic.csv')

stats

#Changing column names with one without spaces
stats.columns = ['CountryName', 'CountryCode', 'BirthRate', 'InternetUsers',
       'IncomeGroup']

stats

#revise what you did yesterday
#subsetting
stats[2:5] #gives you three rows
stats.CountryName #gives you single column - note name must contain a single word
stats[['BirthRate', 'InternetUsers']]
#combining both
stats[['BirthRate', 'InternetUsers']][2:5]
#you can also use head() and tail() with this subsets
stats[['BirthRate', 'InternetUsers']].tail(20)



In [25]:
stats.BirthRate[8:15] * stats.InternetUsers[8:15]

8     1095.60000
9      757.81672
10    1074.21000
11      57.39630
12     920.30624
13     178.55600
14     369.01410
dtype: float64

In [29]:
## 2. Adding a new column to your data frame
stats['MyCalc'] = stats.BirthRate * stats.InternetUsers

stats.MyCalc.head()

0    808.2516
1    207.9927
2    878.3135
3    736.5644
4    971.8720
Name: MyCalc, dtype: float64

In [34]:
#Comparison to R
stats['New']= list(range(0,195))

stats.head()

#This works as well 

#Note, this works because range(0,195) function is able to generate list 
#that is of same length as the existing column length

stats['Not'] = [1,2,3,4,5,6]
#This doesnot create a new column 
#why?
#ValueError: Length of values (6) does not match length of index (195)

#This would have worked in R
#Because of recycling vectors.


ValueError: Length of values (6) does not match length of index (195)

In [35]:
len(stats)

195

In [36]:
stats.head()

Unnamed: 0,CountryName,CountryCode,BirthRate,InternetUsers,IncomeGroup,MyCalc,New
0,Aruba,ABW,10.244,78.9,High income,808.2516,0
1,Afghanistan,AFG,35.253,5.9,Low income,207.9927,1
2,Angola,AGO,45.985,19.1,Upper middle income,878.3135,2
3,Albania,ALB,12.877,57.2,Upper middle income,736.5644,3
4,United Arab Emirates,ARE,11.044,88.0,High income,971.872,4


In [57]:
stats.drop('MyCalc',1) #this will not remove this column from existing object. Instead it will create a new object

stats = stats.drop('MyCalc',1)

stats



  stats.drop('MyCalc',1) #this will not remove this column from existing object. Instead it will create a new object


KeyError: "['MyCalc'] not found in axis"

In [64]:
stats

Unnamed: 0,CountryName,CountryCode,BirthRate,InternetUsers,IncomeGroup
0,Aruba,ABW,10.244,78.9,High income
1,Afghanistan,AFG,35.253,5.9,Low income
2,Angola,AGO,45.985,19.1,Upper middle income
3,Albania,ALB,12.877,57.2,Upper middle income
4,United Arab Emirates,ARE,11.044,88.0,High income
...,...,...,...,...,...
190,"Yemen, Rep.",YEM,32.947,20.0,Lower middle income
191,South Africa,ZAF,20.850,46.5,Upper middle income
192,"Congo, Dem. Rep.",COD,42.394,2.2,Low income
193,Zambia,ZMB,40.471,15.4,Lower middle income


In [66]:
stats['Honey']= stats.InternetUsers/stats.BirthRate

stats

Unnamed: 0,CountryName,CountryCode,BirthRate,InternetUsers,IncomeGroup,Honey
0,Aruba,ABW,10.244,78.9,High income,7.702070
1,Afghanistan,AFG,35.253,5.9,Low income,0.167362
2,Angola,AGO,45.985,19.1,Upper middle income,0.415353
3,Albania,ALB,12.877,57.2,Upper middle income,4.442028
4,United Arab Emirates,ARE,11.044,88.0,High income,7.968127
...,...,...,...,...,...,...
190,"Yemen, Rep.",YEM,32.947,20.0,Lower middle income,0.607036
191,South Africa,ZAF,20.850,46.5,Upper middle income,2.230216
192,"Congo, Dem. Rep.",COD,42.394,2.2,Low income,0.051894
193,Zambia,ZMB,40.471,15.4,Lower middle income,0.380519


In [69]:
#let's delete this new column
stats.drop('Honey',1, inplace=True)

stats

  stats.drop('Honey',1, inplace=True)


KeyError: "['Honey'] not found in axis"

In [79]:
stats

Unnamed: 0,CountryName,CountryCode,BirthRate,InternetUsers,IncomeGroup
0,Aruba,ABW,10.244,78.9,High income
1,Afghanistan,AFG,35.253,5.9,Low income
2,Angola,AGO,45.985,19.1,Upper middle income
3,Albania,ALB,12.877,57.2,Upper middle income
4,United Arab Emirates,ARE,11.044,88.0,High income
...,...,...,...,...,...
190,"Yemen, Rep.",YEM,32.947,20.0,Lower middle income
191,South Africa,ZAF,20.850,46.5,Upper middle income
192,"Congo, Dem. Rep.",COD,42.394,2.2,Low income
193,Zambia,ZMB,40.471,15.4,Lower middle income


In [71]:
stats.backup = stats

  stats.backup = stats
