In [1]:
import pandas as pd
import numpy as np
from column_completer import ColumnCompleter

Make up some data

In [2]:
X = np.random.randint(0, 100, (10, 5)) 
df = pd.DataFrame(X, columns=['Type A', 'Type B', 'Type C', 'Type D', 'Type E']) 
df

Unnamed: 0,Type A,Type B,Type C,Type D,Type E
0,70,88,32,79,39
1,6,8,56,97,86
2,8,55,42,31,36
3,32,72,80,80,43
4,82,22,24,66,54
5,60,51,62,90,8
6,73,21,3,1,43
7,89,49,35,97,33
8,75,67,18,25,16
9,96,44,50,10,41


Instantiate the ColumnCompleter object, and for the sake of maintaining a valid Python syntax, replace all spaces with an underscore.
The value returned will be the original column name though.

In [3]:
q = ColumnCompleter(df, space_filler='_')  # an underscore is the default value, but it's explicitly put here for pedagogical reasons

In [4]:
q.Type_A

'Type A'

In [5]:
q.Type_D

'Type D'

It is convinient for indexing, in case Pandas won't autocomplte in some deeply nested calls (usually more nested than this one)

In [6]:
df[q.Type_B] 

0    88
1     8
2    55
3    72
4    22
5    51
6    21
7    49
8    67
9    44
Name: Type B, dtype: int32

Creating the completion object with a DataFrame in which the columns starts or ends with spaces raises warnings.

In [7]:
df = pd.DataFrame(X, columns=['type A ', 'type B  ', ' type C ', ' type D', 'type E']) 

In [8]:
q = ColumnCompleter(df)

Warning: Index(['The following columns starts with one or more spaces:  type C ', 'The following columns starts with one or more spaces:  type D'], dtype='object')

Unless `silence_warnings` are set to `True` 

In [9]:
q = ColumnCompleter(df, silence_warnings=True)

In [10]:
q.type_E

'type E'

In [11]:
q.type_A_

'type A '

If the replacing of spaces causes column names to overlap, a ValueError is raised. 

In [12]:
df = pd.DataFrame(X, columns=['type A', 'type_A', 'type B', 'type C', 'type D']) 

In [13]:
q = ColumnCompleter(df)

ValueError: Using '_' as a replacemnt for spaces causes a collision of column names, please chose another.

Using another value for `space_filler` such as two underscores, will resolve the problem in this particular case.

In [14]:
q = ColumnCompleter(df, space_filler='__')

In [15]:
q.type__A

'type A'

In [16]:
q.type_A

'type_A'

A better approach will often be to rename the columns if the DataFrame, and the `ColumnCompleter` class provides a convinience function for doing just that.

Note that the functionality is proviced by the _class_ not the _class instance_ (`q`)!

In [17]:
df = pd.DataFrame(X, columns=['type A', 'type B', 'type C', 'type D', 'type E']) 
ColumnCompleter.replace_df_column_spaces(df, '_')

Unnamed: 0,type_A,type_B,type_C,type_D,type_E
0,70,88,32,79,39
1,6,8,56,97,86
2,8,55,42,31,36
3,32,72,80,80,43
4,82,22,24,66,54
5,60,51,62,90,8
6,73,21,3,1,43
7,89,49,35,97,33
8,75,67,18,25,16
9,96,44,50,10,41


Capatalizing column names can be helpful to distinguish them from the rest of Pandas' methods; capitilization can be achieved by setting `capatilize_first_letter=True`

In [18]:
ColumnCompleter.replace_df_column_spaces(df, '_', capatilize_first_letter=True)

Unnamed: 0,Type_A,Type_B,Type_C,Type_D,Type_E
0,70,88,32,79,39
1,6,8,56,97,86
2,8,55,42,31,36
3,32,72,80,80,43
4,82,22,24,66,54
5,60,51,62,90,8
6,73,21,3,1,43
7,89,49,35,97,33
8,75,67,18,25,16
9,96,44,50,10,41


A `ValueError` is raised if the rename causes a name collision.

In [19]:
df = pd.DataFrame(X, columns=['type A', 'type_A', 'type B', 'type C', 'type D']) 
ColumnCompleter.replace_df_column_spaces(df, '_')

ValueError: Renaming the columns in such a way would cause a collision of column names.