# Demo columns cleanup automation

This utility function can be used to "cleanup" the column names of a dataframe.

It removes spaces and commas from column names and capitalize the first letter of each word.

Using this script makes it easier to work with a dataframe.

e.g. instead of accessing a column using a dictionary-like notation, it is faster to use the attribute notation to access column names.

```python
df[['Date of bith']] 
```

vs.

```python
df.DateOfBith 
```


In [1]:
import pandas as pd
from datetime import datetime
import numpy as np



In [2]:
def CleanCols (df):

    """
    Renames the columns of a DataFrame by removing spaces and using camel case.
    Prints the DtataFrame vitals (shape, etc.)
    Arguments:
        df: a DataFrame        
    Returns:
        A DataFrame with clean columns
    """
    
    df.columns = df.columns.str.title()
    df.columns = df.columns.str.replace(' ','')
    print('Dataframe has ',df.shape[0],' rows and ',df.shape[1],' columns\n')
    
    df_types_groups = df.columns.to_series().groupby(df.dtypes).apply(list).reset_index()
    df_types_groups.columns=['data_type','fields']
    df_types_groups
    
    df_types_groups.apply(pd.to_numeric, errors='coerce')
    
    for index, row in df_types_groups.iterrows():
        sorted_cols = row.fields
        sorted_cols.sort()
        data_type  = row.data_type.__class__.__name__
        print("**"+data_type+"**")
        print('   (',len(row.fields),' columns):\n',sorted_cols,'\n\n')
            
    now = datetime.datetime.now()    
    
    print(now.strftime('\nData loaded with clean columns on the %d, %b %Y at %H:%M:%S\n'))
    
    return df