# Pandas - DataFrame Part1
## TOPICS:
What is a DataFrame?

1. CREATE
2. READ
3. UPDATE
4. DELETE





In [2]:
import pandas as pd
import numpy as np

## What is a DataFrame?
DataFrame is a Data in a table form (like Excel) where all data points in the same column have the same type.

_Note: At this point, you don't have to worry if you don't understand the code below. We will learn about that shortly after this._

In [8]:
df = pd.DataFrame( {'Gold': [15,12,10], 'Silver': [11,9,5] , 'Bronze': [6,8,12], 
'Continent': ['Europe','Asia','Asia'] } , ['France','China','Thailand'])
df

Unnamed: 0,Gold,Silver,Bronze,Continent
France,15,11,6,Europe
China,12,9,8,Asia
Thailand,10,5,12,Asia


## 1. CREATE
The simple way to create a dataframe is by using a dictionary. A dictionary holds key:vaule where **key** is column name. The **value** is a **list** of data points.

In [9]:
my_dt = {'Gold': [15,12,10], 'Silver': [11,9,5] , 'Bronze': [6,8,12], 
'Continent': ['Europe','Asia','Asia']}
df1 = pd.DataFrame(my_dt) 
df1

Unnamed: 0,Gold,Silver,Bronze,Continent
0,15,11,6,Europe
1,12,9,8,Asia
2,10,5,12,Asia


Notice that the index of the row is a number. This is called **the label index**. As the word **label** may hint us already, we can set **the label index** to string as well.

In [10]:
my_dt = {'Gold': [15,12,10], 'Silver': [11,9,5] , 'Bronze': [6,8,12], 
'Continent': ['Europe','Asia','Asia']}
my_list = ['France', 'China', 'Thailand']
df2 = pd.DataFrame(my_dt, my_list) 
df2

Unnamed: 0,Gold,Silver,Bronze,Continent
France,15,11,6,Europe
China,12,9,8,Asia
Thailand,10,5,12,Asia


** Create a dataframe by manually setting each column with a series. **

In [11]:
df3 = pd.DataFrame()            # Start with empty dataframe
ser1 = pd.Series([15,11,6])     # Create a series from a list
ser2 = pd.Series([11,9,5])      # Create a series from a list
ser3 = pd.Series([10,5,2])      # Create a series from a list
ser4 = pd.Series(["Europe","Asia","Asia"])  # Create a series from a list

df3["Gold"] = ser1
df3["Silver"] = ser2
df3["Bronze"]= ser3
df3["Continent"] = ser4

my_index = pd.Series(["France","China","Thailand"]) # Create a series from a list
df3.set_index(my_index)

Unnamed: 0,Gold,Silver,Bronze,Continent
France,15,11,10,Europe
China,11,9,5,Asia
Thailand,6,5,2,Asia


## 2 READ
    * 2.1 Subsetting columns
            * Subsetting one column
            * Subsetting multiple columns
    * 2.2 Subsetting rows
            * Subsetting rows when numbers are used as the label index
            * Subsetting rows when labels are used as the label index
    * 2.3 Subsetting both rows and columns
    * 2.4 Accessing any particular element

### 2.1 Subsetting columns
** Subsetting one column :** We can subset a column out by column name.
```
df[ 'column_name' ]
```

In [12]:
df1['Continent']

0    Europe
1      Asia
2      Asia
Name: Continent, dtype: object

**Subsetting multiple columns :** We can subset multiple columns by a list of column names. 
```
df[ ['col_name1', 'col_name2',..., 'col_namen'] ]
```

In [13]:
df1[ ['Gold','Continent'] ]

Unnamed: 0,Gold,Continent
0,15,Europe
1,12,Asia
2,10,Asia


### 2.2 Subsetting rows
** Subsetting rows when numbers are used as the label index **

We can use  **```loc[ 'index_label' ]```** to subset the row.

The **index labels** of df1 are numbers. We can subset the row by the numbers.

In [5]:
print(df1)

NameError: name 'df1' is not defined

In [13]:
df1.loc[0]

Gold             15
Silver           11
Bronze            6
Continent    Europe
Name: 0, dtype: object

In [4]:
df1.loc[1]

NameError: name 'df1' is not defined

In [None]:
df1.loc[1:3] # We can slice them too

** Subsetting rows when labels are used as the label index **

We can use  **```loc[ 'label_index' ]```** to subset the row.

If there are multiple rows, we can use a set of label.

_Note: df2 has string as the index labels._

In [14]:
df2

Unnamed: 0,Gold,Silver,Bronze,Continent
France,15,11,6,Europe
China,12,9,8,Asia
Thailand,10,5,12,Asia


In [15]:
df2.loc['China']

Gold           12
Silver          9
Bronze          8
Continent    Asia
Name: China, dtype: object

In [16]:
df2.loc[ ['China','France'] ]

Unnamed: 0,Gold,Silver,Bronze,Continent
China,12,9,8,Asia
France,15,11,6,Europe



**What if the dataframe uses string as the label index but we need to subset many rows out? ** 

Do we need to make a list of the countries we want?

There are 193 countries in the world, I want the top 50.

Typing a list of 50 country names is tedious.

We can still subset it by the number using **```iloc[ ]```**.

In [None]:
#df2.loc[1:3] # ERROR. Label is not number
df2.iloc[1:3] # iloc saves the day

### 2.3 Both rows and columns
As we just learn that we can use ```iloc[ ]``` to subset data. 

We can apply the same method to subset both rows and columns. 

Normally, we use indexes to do slicing.

**```df.iloc[ row_slicing , column_slicing ]```**

In [17]:
df2.iloc[ : , 0:1 ] # all rows, first column

Unnamed: 0,Gold
France,15
China,12
Thailand,10


In [None]:
df2.iloc[ : , 1:] # all rows, second column to the end

In [None]:
df2.iloc[ :2 , :] # first two rows, all columns

In [None]:
df2.iloc[ :2 , [0, 3] ] # first two rows, only the Gold and Continent

### 2.4 Accessing any particular element
After we subset a row out, that row is actually a series. Therefore, we can access the element easily by using index.

In [18]:
x = df2.iloc[2]
x

Gold           10
Silver          5
Bronze         12
Continent    Asia
Name: Thailand, dtype: object

In [None]:
type(x)

In [None]:
x[2] # Let's access the number of Bronze (index )

In [None]:
x[3]

## 3 UPDATE
* 3.1 Change existing value
* 3.2 Append another dataframe
* 3.3 Add another column

### 3.1 Change existing value
We can update existing value by assigning a new value to it.

In [None]:
df2

In [None]:
df2.iloc[1,2] = 999

In [None]:
df2

### 3.2 Append ananother dataframe
We can use appen( ) function.

In [None]:
dfplus = pd.DataFrame( {'Gold': [3], 'Silver': [2] , 'Bronze': [0], 
'Continent': ['Europe'] } , ['England'])
dfplus

In [None]:
df3 = df2.append(dfplus)
df3

In [None]:
df2

### 3.3 Add another column
Adding another series is easy. We can assign a new series to a new column label.

In [None]:
df3["Abbreviation"] = pd.Series(['FR','CN','TH','UK'], index=df3.index)

In [None]:
df3

## 4 DELETE
* 4.1 Delete rows
* 4.2 Delete columns

In [None]:
df2

### 4.1 Delete rows
We can use drop( label ) to delete the row.
Note: drop will return a new dataframe.

In [None]:
df3 = df2.drop("China")
df3

In [None]:
df2

If we want drop to happen to that dataframe, we can set inplace to True.

In [None]:
df4 = df2.copy()
df4

In [None]:
df4.drop("China", inplace = True)
df4

In [None]:
df5 = df2.copy()
df5

In [None]:
df6 = df2.copy()
df6.drop("Thailand", axis=0, inplace=True)
df6

### 4.2 Delete columns
We can use the same drop( ). This time, we just have to set axis to 1. 

(the default is axis=0 which means row)

In [1]:
df7 = df2.copy()
df7

NameError: name 'df2' is not defined

In [None]:
df7.drop("Gold",axis=1, inplace=True)
df7