Sources:

- https://ujjwalkarn.me/2016/05/30/common-operations-on-pandas-dataframe/


Index: 

- Renaming Columns in Pandas
- Deleting Columns from pandas DataFrame
- Adding new Column to existing DataFrame
- Add one Row in a pandas.DataFrame
- Changing the order of DataFrame Columns
- Changing data type of Columns
- Getting a list of the column headers from a DataFrame
- Converting list of dictionaries to DataFrame
- Getting row count of pandas DataFrame
- Most efficient way to loop through DataFrames
- Deleting DataFrame row based on column value
- Dropping a list of rows from Pandas DataFrame

In [25]:
import pandas as pd
df = None
def initialize_df(): # simple function to be used often
    global df
    df = pd.DataFrame({'col_1':[1,2], 'col_2': [10,20]})

# Renaming Columns in Pandas
## rename all columns

In [19]:
initialize_df()
print(df.columns)
df.columns = ['a','b']
print(df.columns)

Index(['col_1', 'col_2'], dtype='object')
Index(['a', 'b'], dtype='object')


Use the ```df.rename()``` function and refer the columns to be renamed. Not all the columns have to be renamed:

In [20]:
df = df.rename(columns={'a': 'newName1', 'b': 'newName2'})
# Or rename the existing DataFrame (rather than creating a copy) 
df.rename(columns={'a': 'newName1', 'b': 'newName2'}, inplace=True)
df

Unnamed: 0,newName1,newName2
0,1,10
1,2,20


replacing some word from all columns

In [31]:
df.columns = df.columns.str.replace('$','')

initialize_df()

# Deleting Columns from pandas DataFrame

simple way

In [33]:
del df['col_1']
df

Unnamed: 0,col_2
0,10
1,20


better way

1 is the axis number (0 for rows and 1 for columns.)

In [41]:
df = df.drop('col_2', 1)
print(df)

initialize_df()
#or just 

df.drop('col_1', axis=1, inplace=True)
df

Empty DataFrame
Columns: []
Index: [0, 1]


Unnamed: 0,col_2
0,10
1,20


In [42]:
initialize_df()

Finally, to drop by column number instead of by column label, try this to delete, e.g. the 1st, 2nd and 4th columns:

In [44]:
df.drop(df.columns[[1]], axis=1)  # df.columns is zero-based pd.Index 

Unnamed: 0,col_1
0,1
1,2


In [46]:
df.pop('col_1')

0    1
1    2
Name: col_1, dtype: int64

In [83]:
initialize_df()

# Adding new Column to existing DataFrame

In [84]:
df['col_3']=[1,2]
df

Unnamed: 0,col_1,col_2,col_3
0,1,10,1
1,2,20,2


# Add one Row in a pandas.DataFrame

In [85]:
df

Unnamed: 0,col_1,col_2,col_3
0,1,10,1
1,2,20,2


without ignore index


In [86]:
df.append([{'col_1':3,'col_2': 4,'col_3':5}])

Unnamed: 0,col_1,col_2,col_3
0,1,10,1
1,2,20,2
0,3,4,5


with ignore_index

In [87]:
df.append([{'col_1':3,'col_2': 4,'col_3':5}], ignore_index=True)

Unnamed: 0,col_1,col_2,col_3
0,1,10,1
1,2,20,2
2,3,4,5


# Changing the order of DataFrame Columns

In [88]:
df = df[['col_2', 'col_3','col_1']]
df

Unnamed: 0,col_2,col_3,col_1
0,10,1,1
1,20,2,2


In [89]:
df = df.reindex(['col_1'] + list(df.columns[:-1]), axis=1)
df

Unnamed: 0,col_1,col_2,col_3
0,1,10,1
1,2,20,2


# Changing data type of Columns

look into ```category``` type when working with string

In [90]:
df.astype(float)

Unnamed: 0,col_1,col_2,col_3
0,1.0,10.0,1.0
1,2.0,20.0,2.0


In [91]:
df['col_1'] = df[['col_1']].astype(float)
df

Unnamed: 0,col_1,col_2,col_3
0,1.0,10,1
1,2.0,20,2


# Getting a list of the column headers from a DataFrame

In [92]:
df.columns.values.tolist()

['col_1', 'col_2', 'col_3']

In [93]:
list(df)

['col_1', 'col_2', 'col_3']

# Converting list of dictionaries to DataFrame

In [97]:
dict_list = [{'a':1,'b':1},{'a':2,'b':2}]
pd.DataFrame(dict_list)

Unnamed: 0,a,b
0,1,1
1,2,2


# Getting row count of pandas DataFrame

In [98]:
df

Unnamed: 0,col_1,col_2,col_3
0,1.0,10,1
1,2.0,20,2


In [99]:
df.shape

(2, 3)

# Most efficient way to loop through DataFrames

Iterating Rows

In [112]:
for index, row in df.iterrows():
    print(index, row)

0 col_1     1.0
col_2    10.0
col_3     1.0
Name: 0, dtype: float64
1 col_1     2.0
col_2    20.0
col_3     2.0
Name: 1, dtype: float64


In [121]:
a = df.itertuples()
list(a)

[Pandas(Index=0, col_1=1.0, col_2=10, col_3=1),
 Pandas(Index=1, col_1=2.0, col_2=20, col_3=2)]

by columns

In [127]:
df.T # then above way

Unnamed: 0,0,1
col_1,1.0,2.0
col_2,10.0,20.0
col_3,1.0,2.0


In [129]:
for column in df.columns:
    print(df[column])

0    1.0
1    2.0
Name: col_1, dtype: float64
0    10
1    20
Name: col_2, dtype: int64
0    1
1    2
Name: col_3, dtype: int64


# Deleting DataFrame row in Pandas based on column value

In [130]:
df[df['col_1']!=1]

Unnamed: 0,col_1,col_2,col_3
1,2.0,20,2


for getting not null 

don't use ```df = df[df.line_race != None]```

In [131]:
df[df.col_2.notnull()]


Unnamed: 0,col_1,col_2,col_3
0,1.0,10,1
1,2.0,20,2


# Dropping a list of rows from Pandas DataFrame

In [132]:
df.drop(df.index[[0]])

Unnamed: 0,col_1,col_2,col_3
1,2.0,20,2
