# Pandas 2
In the previous section we briefly went over how to read a csv file into a dataframe and look at different columns, row, and cells within the dataframe. Here we will continue to explore dataframes and use some standard statistical operations on them. Instead of reading a csv file into a dataframe, let's create one manually by passing a dictionary to pd.DataFrame:

    dataframe = pd.DataFrame({Column1 Name: [row1 value, row2 value, etc.],
                              Column2 Name: [row1 value, row2 value, etc.],
                              etc.}

In [2]:
# import pandas
import pandas as pd

# create a dataframe called df
df = pd.DataFrame({'Column 1':[1, 2, 3],
                   'Column 2':[4, 5, 6],
                   'Column 3':[7, 8, 9]})

In [3]:
print(df)

   Column 1  Column 2  Column 3
0         1         4         7
1         2         5         8
2         3         6         9


In [10]:
print('axis 0 refers to: ' + str(df.axes[0]))
print('axis 1 refers to: ' + str(df.axes[1]))

axis 0 refers to: RangeIndex(start=0, stop=3, step=1)
axis 1 refers to: Index(['Column 1', 'Column 2', 'Column 3'], dtype='object')


### Adding Rows
If we had another dataframe and we wanted to it to our existing data we would use the .concat function or the append method:

In [16]:
# create another dataframe
df1 = pd.DataFrame({'Column 1': [11],
                    'Column 2': [12],
                    'Column 3': [13]})
print(df)
print(df1)

   Column 1  Column 2  Column 3
0         1         4         7
1         2         5         8
2         3         6         9
   Column 1  Column 2  Column 3
0        11        12        13


In [51]:
# create a new dataframe by concatenating df and df1
df3 = pd.concat([df, df1])
# print df3
print(df3)

   Column 1  Column 2  Column 3
0         1         4         7
1         2         5         8
2         3         6         9
0        11        12        13


In [101]:
df3 = pd.concat([df, df1], ignore_index=True)
# print df3
print(df3)

   Column 1  Column 2  Column 3
0         1         4         7
1         2         5         8
2         3         6         9
3        11        12        13


### Adding Columns
If you want to add a column you simply assign it like so:

In [102]:
# create a new column with values listed in a series
df3['My New Column'] = [14, 15, 16, 17]

# print df3
print(df3)

   Column 1  Column 2  Column 3  My New Column
0         1         4         7             14
1         2         5         8             15
2         3         6         9             16
3        11        12        13             17


In [103]:
df3.insert(0, 'insert method', 'missing')

df3

Unnamed: 0,insert method,Column 1,Column 2,Column 3,My New Column
0,missing,1,4,7,14
1,missing,2,5,8,15
2,missing,3,6,9,16
3,missing,11,12,13,17


### Deleting Rows
You can use the .drop method to return a new dataframe without the index of the rows specified:

In [104]:
# drop the fourth row, which is given by index 3 as the index starts from zero
df_drop_row = df3.drop([3])
print(df_drop_row)

  insert method  Column 1  Column 2  Column 3  My New Column
0       missing         1         4         7             14
1       missing         2         5         8             15
2       missing         3         6         9             16


In [105]:
df_drop_slice_row = df3.drop(df3.index[1:3])
print(df_drop_slice_row)

  insert method  Column 1  Column 2  Column 3  My New Column
0       missing         1         4         7             14
3       missing        11        12        13             17


### Deleting Columns
You can delete a column using the .pop method, the .drop method and including axis=1, or the del statement:

In [106]:
# here we create a new dataframe frome df3 by using .drop
df_drop = df3.drop(['My New Column'], axis=1)
df_drop

Unnamed: 0,insert method,Column 1,Column 2,Column 3
0,missing,1,4,7
1,missing,2,5,8
2,missing,3,6,9
3,missing,11,12,13


In [107]:
# using .pop to remove a column from df3
df3.pop('My New Column')
df3

Unnamed: 0,insert method,Column 1,Column 2,Column 3
0,missing,1,4,7
1,missing,2,5,8
2,missing,3,6,9
3,missing,11,12,13


In [108]:
# create a new column with values listed in a series
df3['My New Column'] = [14, 15, 16, 17]

# print df3
print(df3)

  insert method  Column 1  Column 2  Column 3  My New Column
0       missing         1         4         7             14
1       missing         2         5         8             15
2       missing         3         6         9             16
3       missing        11        12        13             17


In [109]:
# using .drop
df_drop = df3.drop(['My New Column'], axis=1)
df_drop

Unnamed: 0,insert method,Column 1,Column 2,Column 3
0,missing,1,4,7
1,missing,2,5,8
2,missing,3,6,9
3,missing,11,12,13


In [110]:
# using del
del df3['insert method']
df3

Unnamed: 0,Column 1,Column 2,Column 3,My New Column
0,1,4,7,14
1,2,5,8,15
2,3,6,9,16
3,11,12,13,17


## Iteration
Iteration over a dataframe by default is made over the column names:

In [111]:
for columns in df3:
    print(columns)

Column 1
Column 2
Column 3
My New Column


In [112]:
for col, index in df3.iteritems():
    print(col+'\n',index)

Column 1
 0     1
1     2
2     3
3    11
Name: Column 1, dtype: int64
Column 2
 0     4
1     5
2     6
3    12
Name: Column 2, dtype: int64
Column 3
 0     7
1     8
2     9
3    13
Name: Column 3, dtype: int64
My New Column
 0    14
1    15
2    16
3    17
Name: My New Column, dtype: int64


In [113]:
for row in df3.iterrows():
    print(row)

(0, Column 1          1
Column 2          4
Column 3          7
My New Column    14
Name: 0, dtype: int64)
(1, Column 1          2
Column 2          5
Column 3          8
My New Column    15
Name: 1, dtype: int64)
(2, Column 1          3
Column 2          6
Column 3          9
My New Column    16
Name: 2, dtype: int64)
(3, Column 1         11
Column 2         12
Column 3         13
My New Column    17
Name: 3, dtype: int64)


In [114]:
df3.shape

(4, 4)

# Index Operations
There are various operations that can be performed with a dataframe, one of the most common is to reindex a dataframe using one of its columns. To make a column the index, use .set_index:

In [115]:
# create a new dataframe based on df3 but with 'Column 1' as the index
col_1_df = df3.set_index('Column 1')

col_1_df

Unnamed: 0_level_0,Column 2,Column 3,My New Column
Column 1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,4,7,14
2,5,8,15
3,6,9,16
11,12,13,17


In [116]:
# create a duplicate value in df3
df3['Column 1'][0] = 2

df3

Unnamed: 0,Column 1,Column 2,Column 3,My New Column
0,2,4,7,14
1,2,5,8,15
2,3,6,9,16
3,11,12,13,17


In [117]:
# create a new dataframe based on df3 but with 'Column 1' as the index
col_1_df = df3.set_index('Column 1', verify_integrity=True)

col_1_df

ValueError: Index has duplicate keys: Int64Index([2], dtype='int64', name='Column 1')

# Accessing And Setting Values
There are two methods to access and/or set values, the first is the .iat which uses an index based notation:

    dataframe.iat[row, column]

In [118]:
print('The value of cell (0,0) in df3 is: ' + str(df3.iat[0,0]))

The value of cell (0,0) in df3 is: 2


In [119]:
df3.at[0, 'Column 1']

2

In [120]:
print('The current value of df3 cell (0,0) is: ' + str(df3.iat[0,0]))

df3.iat[0,0] = 5

print('The new value of df3 cell (0,0) is: ' + str(df3.iat[0,0]))

The current value of df3 cell (0,0) is: 2
The new value of df3 cell (0,0) is: 5


# Replacing Values
If you want to update all values in a dataframe you can use the .replace method

In [149]:
print(df3)

print('\nReplacing all values of 5 with 99')
      
df4 = df3.replace(5,99)
df4

   Column 1  Column 2  Column 3  My New Column
0         5         4         7             14
1         2         5         8             15
2         3         6         9             16
3        11        12        13             17

Replacing all values of 5 with 99


Unnamed: 0,Column 1,Column 2,Column 3,My New Column
0,99,4,7,14
1,2,99,8,15
2,3,6,9,16
3,11,12,13,17


In [150]:
df4 = df3.replace({'Column 1': {5:99}})

df4

Unnamed: 0,Column 1,Column 2,Column 3,My New Column
0,99,4,7,14
1,2,5,8,15
2,3,6,9,16
3,11,12,13,17


# Sorting
You can sort a dataframe either by it's index or column values using:

    dataframe.sort_values(column name)

In [151]:
print('Current arrangement of df3: \n')
print(df3)
print('\nAfter sorting by column 1: \n')
df3.sort_values('Column 1')

Current arrangement of df3: 

   Column 1  Column 2  Column 3  My New Column
0         5         4         7             14
1         2         5         8             15
2         3         6         9             16
3        11        12        13             17

After sorting by column 1: 



Unnamed: 0,Column 1,Column 2,Column 3,My New Column
1,2,5,8,15
2,3,6,9,16
0,5,4,7,14
3,11,12,13,17


In [152]:
df3.sort_index(ascending=False)

Unnamed: 0,Column 1,Column 2,Column 3,My New Column
3,11,12,13,17
2,3,6,9,16
1,2,5,8,15
0,5,4,7,14


In [153]:
# create a dataframe called df
df = pd.DataFrame({'Column 1':[1, 1, 1, 2, 2, 2],
                   'Column 2':[1, 2, 3, 3, 2, 1],
                   'Column 3':[1, 2, 3, 4, 5, 6]})
print('Current arrangement of df: \n')
print(df)
print('\nAfter sorting by Column 1 then Column 2: \n')
df.sort_values(['Column 1', 'Column 2'], ascending=[False, True])

Current arrangement of df: 

   Column 1  Column 2  Column 3
0         1         1         1
1         1         2         2
2         1         3         3
3         2         3         4
4         2         2         5
5         2         1         6

After sorting by Column 1 then Column 2: 



Unnamed: 0,Column 1,Column 2,Column 3
5,2,1,6
4,2,2,5
3,2,3,4
0,1,1,1
1,1,2,2
2,1,3,3
