# Columbia Attention Token - EDA and Wrangling of Engagement Data from CU-FinTech Slack Channels

## About This Notebook...

This notebook contains code written for exploratory analysis and wrangling of data collected from the Slack communication channels of Columbia University FinTech program. The analayis began with the downloading and collection of raw data files in the .json format from Slack channels of interest (01-live-mw, 02-ask-the-class, 03-resources, and fintech) provided by the instructor. The Slack API is a good method for collecting this data, but without 'owner' or 'administrator' permissions, as a work-around, zip files previewed for sensitive information were requested. The raw data was parsed into folders according to the Slack channel of origin, then concatenated into a single DataFrame. With this step completed, assessment can begin. After assessment, some columns were slected to be dropped, as they did not pertain to the focus of the analysis. Once unnecessary columns are dropped, the process of cleaning the data begins. A series of cleaning functions were defined to carry out tasks such as filtering rows that have subtype values, datetime wrangling (which gives a breakdown of engagement traffic at various increments of time), extracting attchments and links, counting text length from posts and comments, and counting and identifying emojis used as reactions. The cleaning functions are then called and the cleaned DataFrame is reviewed. Once cleaning is accomplished, a function that carries out feature engineering is defined and called, giving us a cleaned and workable DataFrame for analysis of student, TA and instructor engagement with the Slack channels through the program's duration. The final step is to write this DataFrame into a .csv file for import into future notebooks.

### Imports, initial DataFrame construction, and data assessment

In [60]:
# Imports
import os, json
import pandas as pd
import numpy as np
import glob
import csv
pd.options.mode.chained_assignment=None

#### Loading multiple JSON files into a single DataFrame

In [61]:
main_folder_path = './slack_channels_data'

def parse_all_json(main_folder_path):
    slack_df = pd.DataFrame()
    
    # Iterate through the group of folders
    for folder in os.listdir(main_folder_path):
        folder_path = os.path.join(main_folder_path, folder)
        
        if os.path.isdir(folder_path):
            # Iterate through each individual folder
            for file in os.listdir(folder_path):
                file = os.path.join(main_folder_path, folder, file)
            
            # Add a channel name column and indicate which folder the vlaue is coming from: 01_live, 02_ask_the_class, 03_resources, fintech
                if file.endswith('.json'):
                    slack_data = pd.read_json(file)
                    slack_data['channel_name'] = folder
                    slack_df = pd.concat([slack_df, slack_data])
                    
    return slack_df

slack_data = parse_all_json('./slack_channels_data/')

In [62]:
# Save to csv
slack_data.to_csv('slack_channels.csv', index=False)

#### Assessing the data

In [63]:
slack_data.shape

(1288, 34)

In [64]:
slack_data.head()

Unnamed: 0,client_msg_id,type,text,user,ts,team,user_team,source_team,user_profile,attachments,...,subscribed,parent_user_id,edited,purpose,x_files,hidden,bot_id,bot_profile,old_name,name
0,5bf56972-c421-4d08-8f14-1af9a35e67eb,message,two upcoming conferences:\n• finovatefall sept...,U023R27V74N,1628731000.0,T024JBZ7VTJ,T024JBZ7VTJ,T024JBZ7VTJ,"{'avatar_hash': '425d563aaa26', 'image_72': 'h...","[{'title': 'FinovateFall fintech event', 'titl...",...,,,,,,,,,,
1,2f74859f-3d4b-4b0b-a666-aaa289787dae,message,also two free virtual expos sept 29-30:\n• blo...,U023R27V74N,1628733000.0,T024JBZ7VTJ,T024JBZ7VTJ,T024JBZ7VTJ,"{'avatar_hash': '425d563aaa26', 'image_72': 'h...",[{'service_name': 'Blockchain Expo North Ameri...,...,,,,,,,,,,
0,,message,<@U023TR6SGJZ> has joined the channel,U023TR6SGJZ,1622723000.0,,,,,,...,,,,,,,,,,
1,,message,<@U023QT9923G> has joined the channel,U023QT9923G,1622733000.0,,,,,,...,,,,,,,,,,
2,,message,<@U023R27V74N> has joined the channel,U023R27V74N,1622735000.0,,,,,,...,,,,,,,,,,


In [65]:
# Reviewing column names
slack_data.columns

Index(['client_msg_id', 'type', 'text', 'user', 'ts', 'team', 'user_team',
       'source_team', 'user_profile', 'attachments', 'blocks', 'channel_name',
       'subtype', 'reactions', 'files', 'upload', 'display_as_bot',
       'thread_ts', 'reply_count', 'reply_users_count', 'latest_reply',
       'reply_users', 'replies', 'is_locked', 'subscribed', 'parent_user_id',
       'edited', 'purpose', 'x_files', 'hidden', 'bot_id', 'bot_profile',
       'old_name', 'name'],
      dtype='object')

In [66]:
# Make a copy for a new DataFrame
slack_df = slack_data.copy()

In [67]:
# Review column names of new DataFrame for consistency
slack_df.columns

Index(['client_msg_id', 'type', 'text', 'user', 'ts', 'team', 'user_team',
       'source_team', 'user_profile', 'attachments', 'blocks', 'channel_name',
       'subtype', 'reactions', 'files', 'upload', 'display_as_bot',
       'thread_ts', 'reply_count', 'reply_users_count', 'latest_reply',
       'reply_users', 'replies', 'is_locked', 'subscribed', 'parent_user_id',
       'edited', 'purpose', 'x_files', 'hidden', 'bot_id', 'bot_profile',
       'old_name', 'name'],
      dtype='object')

In [68]:
# Check how many values (posts, comments) each channel has
slack_df.channel_name.value_counts()

02-ask-the-class    686
01-live-mw          439
03-resources        101
fintech              62
Name: channel_name, dtype: int64

In [69]:
# List of columns, their non-null objects and data type of columns
slack_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1288 entries, 0 to 1
Data columns (total 34 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   client_msg_id      1112 non-null   object 
 1   type               1288 non-null   object 
 2   text               1288 non-null   object 
 3   user               1288 non-null   object 
 4   ts                 1288 non-null   float64
 5   team               1082 non-null   object 
 6   user_team          1081 non-null   object 
 7   source_team        1081 non-null   object 
 8   user_profile       1081 non-null   object 
 9   attachments        130 non-null    object 
 10  blocks             1165 non-null   object 
 11  channel_name       1288 non-null   object 
 12  subtype            77 non-null     object 
 13  reactions          318 non-null    object 
 14  files              129 non-null    object 
 15  upload             129 non-null    object 
 16  display_as_bot     115 non-

In [70]:
# Check if there are any null values in the dataset
slack_df.isna().mean().round(4) * 100

client_msg_id        13.66
type                  0.00
text                  0.00
user                  0.00
ts                    0.00
team                 15.99
user_team            16.07
source_team          16.07
user_profile         16.07
attachments          89.91
blocks                9.55
channel_name          0.00
subtype              94.02
reactions            75.31
files                89.98
upload               89.98
display_as_bot       91.07
thread_ts            34.86
reply_count          85.40
reply_users_count    85.40
latest_reply         85.40
reply_users          85.40
replies              85.40
is_locked            85.40
subscribed           85.40
parent_user_id       49.46
edited               96.82
purpose              99.77
x_files              99.84
hidden               99.92
bot_id               99.92
bot_profile          99.92
old_name             99.92
name                 99.92
dtype: float64

#### Note:

The missing values in the dataset do not correspond to missing data, but rather the fact that some posts do not have replies, attachments, or reactions. For the task at hand, what is most important is that the 'text' and 'user' columns do not have missing values.

With these boxes checked, we can bgein a deeper dive into assesment of the rows to identitfy what we don't need.

In [71]:
# Count the values of the 'subtype' column
slack_df['subtype'].value_counts()

channel_join       72
channel_purpose     3
tombstone           1
channel_name        1
Name: subtype, dtype: int64

### Summary of Data Assessment

#### Columns to drop:

- type, team, user_team, source_team, latest_reply, last_read, bot_id, bot_profile, display_as_bot, topic, blocks, edited, is_locked, subscribed, upload, display_as_bot, root, purpose, thread_ts, parent_used_id

#### Columns to clean & wrangle:

- subtype: filter out it's values from df, remove the original column
- ts: changing it to datetime, remove miliseconds, get days of the week, months of the year, type of the day, parts of the day
- user_profile: extract real_name in new column, remove the original
- attachments: extract title, text, link in new columns
- files: extract url_private and who shared
- attachments: extract title, text, link in new columns
- reactions: extract user, count, name of the emoji

### Preparing for Cleaning

In [72]:
# Make a copy of the DataFrame for the cleaning process
slack_df_clean = slack_df.copy()

In [73]:
# Review the DateFrame
slack_df_clean.head()

Unnamed: 0,client_msg_id,type,text,user,ts,team,user_team,source_team,user_profile,attachments,...,subscribed,parent_user_id,edited,purpose,x_files,hidden,bot_id,bot_profile,old_name,name
0,5bf56972-c421-4d08-8f14-1af9a35e67eb,message,two upcoming conferences:\n• finovatefall sept...,U023R27V74N,1628731000.0,T024JBZ7VTJ,T024JBZ7VTJ,T024JBZ7VTJ,"{'avatar_hash': '425d563aaa26', 'image_72': 'h...","[{'title': 'FinovateFall fintech event', 'titl...",...,,,,,,,,,,
1,2f74859f-3d4b-4b0b-a666-aaa289787dae,message,also two free virtual expos sept 29-30:\n• blo...,U023R27V74N,1628733000.0,T024JBZ7VTJ,T024JBZ7VTJ,T024JBZ7VTJ,"{'avatar_hash': '425d563aaa26', 'image_72': 'h...",[{'service_name': 'Blockchain Expo North Ameri...,...,,,,,,,,,,
0,,message,<@U023TR6SGJZ> has joined the channel,U023TR6SGJZ,1622723000.0,,,,,,...,,,,,,,,,,
1,,message,<@U023QT9923G> has joined the channel,U023QT9923G,1622733000.0,,,,,,...,,,,,,,,,,
2,,message,<@U023R27V74N> has joined the channel,U023R27V74N,1622735000.0,,,,,,...,,,,,,,,,,


In [74]:
# Review the column names
slack_df_clean.columns

Index(['client_msg_id', 'type', 'text', 'user', 'ts', 'team', 'user_team',
       'source_team', 'user_profile', 'attachments', 'blocks', 'channel_name',
       'subtype', 'reactions', 'files', 'upload', 'display_as_bot',
       'thread_ts', 'reply_count', 'reply_users_count', 'latest_reply',
       'reply_users', 'replies', 'is_locked', 'subscribed', 'parent_user_id',
       'edited', 'purpose', 'x_files', 'hidden', 'bot_id', 'bot_profile',
       'old_name', 'name'],
      dtype='object')

### Cleaning the Data

In [75]:
# Creating a cleaning function
def clean_dataframe(slack_df_clean):
    
    # Drop the columns that aren't needed
    slack_df_clean.drop(['type', 'client_msg_id', 'team', 'user_team',
             'source_team', 'blocks', 'upload', 'display_as_bot',
             'thread_ts', 'latest_reply', 'is_locked', 'subscribed',
             'parent_user_id', 'bot_id', 'bot_profile', 'edited',
             'purpose', 'old_name', 'name', 'hidden',
             'x_files'], axis=1, inplace=True)
    
    # Filter for the rows that have subtype values
    slack_df_clean = slack_df_clean[(slack_df_clean.subtype != 'channel_join') &
                                    (slack_df_clean.subtype != 'channel_purpose') &
                                    (slack_df_clean.subtype != 'thread_broadcast')
                                   ]
    
    # Drop subtype column witht the valus we don't need anymore
    slack_df_clean.drop('subtype', axis=1, inplace=True)
    
    return slack_df_clean

### Wrangling the Data

In [76]:
# Create a function to summarise the wranglign steps with datetime
def datetime_wrangling(slack_df_clean):
    
    # Convert ts to datetime from float
    slack_df_clean['ts'] = pd.to_datetime(slack_df_clean['ts'], unit='s').astype('datetime64[s]')
    
    # Create a column for the days of the week using the ts column
    slack_df_clean['day_name'] = slack_df_clean['ts'].dt.day_name()
    slack_df_clean['day_number'] = pd.DatetimeIndex(slack_df_clean['ts']).day
    
    # Create a column for the months of the year using the ts column
    slack_df_clean['month'] = pd.DatetimeIndex(slack_df_clean['ts']).month
    
    # Convert values to date time and the month names
    slack_df_clean['month'] = pd.to_datetime(slack_df_clean['month'], format='%m').dt.month_name()
    
    # Create a column for the type fo the weekday using the ts column
    slack_df_clean['day_type'] = slack_df_clean.ts.dt.weekday.apply(
    lambda x: 'Weekday' if x<5 else 'Weekend')
    
    # Create a column for the hour of the day using the ts column
    slack_df_clean['time'] = slack_df_clean['ts'].dt.strftime('%H')
    
    # Create a column for the parts of the day
    slack_df_clean['dayparts'] = (slack_df_clean['ts'].dt.hour % 24 + 4) // 4
    slack_df_clean['dayparts'].replace({1: 'Late Night',
                                       2: 'Early Morning',
                                       3: 'Morning',
                                       4: 'Afternoon',
                                       5: 'Evening',
                                       6: 'Night'}, inplace=True)
    
    # Drop the ts column
    slack_df_clean.drop('ts', axis=1, inplace=True)
    
    return slack_df_clean

In [77]:
# Create a function to extract links from columns
def return_attachments(txt):
    try:
        dictionary = (txt)[0]
        if 'original_url' in dictionary:
            return dictionary.get('original_url', 'None')
    except:
        return 'None'
        
slack_df_clean['attachments'] = slack_df_clean['attachments'].apply(return_attachments)

# slack_df_clean.to_csv('slack_attachments.csv', columns=header, index=False)


In [78]:
# Create function to extract real_name from user_profile
#def real_name(x):
    #if x != x:
        #return 'noname'
    #else:
        #return x['real_name']
    
#slack_df_clean['real_name'] = slack_df_clean['user_profile'].apply(real_name)

# Drop the user_profile column
slack_df_clean.drop('user_profile', axis=1, inplace=True)

In [79]:
# Create a function to count reactions in columns
def reactions_count(txt):
    reactions_count = 0
    #try:
    #print(txt)
    if str(txt) != "nan":
        #print("is nan")
        #dictionary = eval(txt)[0]
        reactions_count = txt[0].get('count')
        #print(reactions_count)

    #else:
        #print("is not nan")
    #dictionary = eval(txt)[2]
        #if 'reactions' in dictionary:
    #print(dictionary)
            #return dictionary.get('reactions', 'None')
    #return dictionary.get('count')
    #except:
        #return 'None'
    #reactions_count = dictionary.get('count')
    
    #slack_df_clean['reactions_count'] = slack_df_clean['reactions'].apply(reactions_count)
    
    return reactions_count
    
slack_df_clean['reactions_count'] = slack_df_clean['reactions'].apply(reactions_count)

In [80]:
# Create a function to indicate the name of the emoji related to the reactions
def reactions_name(txt):
    reactions_name = 0
    #print(txt)
    if str(txt) != "nan":
        reactions_name = txt[0].get('name')
        #print(reactions_name)
    return reactions_name

slack_df_clean['reactions_name'] = slack_df_clean['reactions'].apply(reactions_name)

In [81]:
# Create a function to indicate the user ID related to the reactions (and name)

def reactions_user(txt):
    reactions_user = 0
    #print(txt)
    if str(txt) != "nan":
        reactions_user = txt[0].get('users')
        #print(reactions_user)
    return reactions_user

slack_df_clean['reactions_user'] = slack_df_clean['reactions'].apply(reactions_user)

In [82]:
# Review the new columns in the dataframe
slack_df_clean.head(20)

Unnamed: 0,client_msg_id,type,text,user,ts,team,user_team,source_team,attachments,blocks,...,purpose,x_files,hidden,bot_id,bot_profile,old_name,name,reactions_count,reactions_name,reactions_user
0,5bf56972-c421-4d08-8f14-1af9a35e67eb,message,two upcoming conferences:\n• finovatefall sept...,U023R27V74N,1628731000.0,T024JBZ7VTJ,T024JBZ7VTJ,T024JBZ7VTJ,https://informaconnect.com/finovatefall/,"[{'type': 'rich_text', 'block_id': 'XF1K', 'el...",...,,,,,,,,0,0,0
1,2f74859f-3d4b-4b0b-a666-aaa289787dae,message,also two free virtual expos sept 29-30:\n• blo...,U023R27V74N,1628733000.0,T024JBZ7VTJ,T024JBZ7VTJ,T024JBZ7VTJ,https://blockchain-expo.com/northamerica/,"[{'type': 'rich_text', 'block_id': 'FK8', 'ele...",...,,,,,,,,0,0,0
0,,message,<@U023TR6SGJZ> has joined the channel,U023TR6SGJZ,1622723000.0,,,,,,...,,,,,,,,0,0,0
1,,message,<@U023QT9923G> has joined the channel,U023QT9923G,1622733000.0,,,,,,...,,,,,,,,0,0,0
2,,message,<@U023R27V74N> has joined the channel,U023R27V74N,1622735000.0,,,,,,...,,,,,,,,0,0,0
0,0affe36c-6e47-411d-a2f9-2c8be07d0820,message,the columbia center of ai is having a symposiu...,U023R27V74N,1633027000.0,T024JBZ7VTJ,T024JBZ7VTJ,T024JBZ7VTJ,https://www.eventbrite.com/e/cait-inaugural-sy...,"[{'type': 'rich_text', 'block_id': '4VM', 'ele...",...,,,,,,,,1,muscle,[U025DPVSGBT]
0,DA2A9773-68A6-4B42-B1F5-0AB1C0606CDE,message,"Y’all, free diy NFT class kicking off at 12pm ...",U025DPVSGBT,1632670000.0,T024JBZ7VTJ,T024JBZ7VTJ,T024JBZ7VTJ,https://buildspace.so/build-nfts,"[{'type': 'rich_text', 'block_id': '63P', 'ele...",...,,,,,,,,3,+1,"[U024J6725K8, U025GLTLJBE, U025E9FJVNU]"
0,7FE32927-8069-448D-B5B1-A9FB66082AFD,message,Anyone interested in next generation APIs here...,U024R294XHV,1633523000.0,T024JBZ7VTJ,T024JBZ7VTJ,T024JBZ7VTJ,https://www.sigtech.com/platform/data,"[{'type': 'rich_text', 'block_id': 'XbCc', 'el...",...,,,,,,,,2,fire,"[U025DPVSGBT, U023R27V74N]"
0,fb703761-514c-4e85-a41c-81c8306a145b,message,<https://medium.com/derivadex/what-are-perpetu...,U025DPVSGBT,1633617000.0,T024JBZ7VTJ,T024JBZ7VTJ,T024JBZ7VTJ,https://medium.com/derivadex/what-are-perpetua...,"[{'type': 'rich_text', 'block_id': '6Rw5', 'el...",...,,,,,,,,1,orange_heart,[U024GNX9CGM]
0,cf458988-7dac-47c6-9782-c798a4775ab1,message,<https://www.fastcompany.com/90669744/spotify-...,U023R27V74N,1630342000.0,T024JBZ7VTJ,T024JBZ7VTJ,T024JBZ7VTJ,https://www.fastcompany.com/90669744/spotify-t...,"[{'type': 'rich_text', 'block_id': 'UriP', 'el...",...,,,,,,,,0,0,0


In [83]:
# Create a function to create a new column with boolean features
def boolean_features(slack_df_clean):
    
    # Create a new boolean column if comment has reaction
    #slack_df_clean['reaction_true'] = slack_df_clean['reactions_name'].notna()
    slack_df_clean['reaction_true'] = slack_df_clean['reactions_count'].notna()
    
    # Create a new boolean column if comment has reply
    slack_df_clean['replies_true'] = slack_df_clean['reply_count'].notna()
    
    # Create a new boolean column if the comment has attachments
    slack_df_clean['attachments_true'] = slack_df_clean['attachments'].notna()
    
    return slack_df_clean

In [84]:
# Create a function to create a new column with text length
def text_length(slack_df_clean):
    slack_df_clean['text_length'] = slack_df_clean['text'].astype(str).map(len)
    
    return slack_df_clean

In [85]:
# Get info on the cleaned DataFrame
slack_df_clean.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1288 entries, 0 to 1
Data columns (total 36 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   client_msg_id      1112 non-null   object 
 1   type               1288 non-null   object 
 2   text               1288 non-null   object 
 3   user               1288 non-null   object 
 4   ts                 1288 non-null   float64
 5   team               1082 non-null   object 
 6   user_team          1081 non-null   object 
 7   source_team        1081 non-null   object 
 8   attachments        1288 non-null   object 
 9   blocks             1165 non-null   object 
 10  channel_name       1288 non-null   object 
 11  subtype            77 non-null     object 
 12  reactions          318 non-null    object 
 13  files              129 non-null    object 
 14  upload             129 non-null    object 
 15  display_as_bot     115 non-null    object 
 16  thread_ts          839 non-

In [86]:
# Call the clean_dataframe function
slack_df_clean = clean_dataframe(slack_df_clean)

In [87]:
# Call the datetime_wrangling function
slack_df_clean = datetime_wrangling(slack_df_clean)

In [88]:
# Call the boolean_features function
slack_df_clean = boolean_features(slack_df_clean)

In [89]:
# Call the text_length function
slack_df_clean = text_length(slack_df_clean)

In [90]:
# Review the DataFrame
slack_df_clean.head(20)

Unnamed: 0,text,user,attachments,channel_name,reactions,files,reply_count,reply_users_count,reply_users,replies,...,day_name,day_number,month,day_type,time,dayparts,reaction_true,replies_true,attachments_true,text_length
0,two upcoming conferences:\n• finovatefall sept...,U023R27V74N,https://informaconnect.com/finovatefall/,fintech,,,,,,,...,Thursday,12,August,Weekday,1,Late Night,True,False,True,415
1,also two free virtual expos sept 29-30:\n• blo...,U023R27V74N,https://blockchain-expo.com/northamerica/,fintech,,,,,,,...,Thursday,12,August,Weekday,1,Late Night,True,False,True,168
0,the columbia center of ai is having a symposiu...,U023R27V74N,https://www.eventbrite.com/e/cait-inaugural-sy...,fintech,"[{'name': 'muscle', 'users': ['U025DPVSGBT'], ...",,,,,,...,Thursday,30,September,Weekday,18,Evening,True,False,True,170
0,"Y’all, free diy NFT class kicking off at 12pm ...",U025DPVSGBT,https://buildspace.so/build-nfts,fintech,"[{'name': '+1', 'users': ['U024J6725K8', 'U025...",,,,,,...,Sunday,26,September,Weekend,15,Afternoon,True,False,True,120
0,Anyone interested in next generation APIs here...,U024R294XHV,https://www.sigtech.com/platform/data,fintech,"[{'name': 'fire', 'users': ['U025DPVSGBT', 'U0...",,,,,,...,Wednesday,6,October,Weekday,12,Afternoon,True,False,True,206
0,<https://medium.com/derivadex/what-are-perpetu...,U025DPVSGBT,https://medium.com/derivadex/what-are-perpetua...,fintech,"[{'name': 'orange_heart', 'users': ['U024GNX9C...",,,,,,...,Thursday,7,October,Weekday,14,Afternoon,True,False,True,208
0,<https://www.fastcompany.com/90669744/spotify-...,U023R27V74N,https://www.fastcompany.com/90669744/spotify-t...,fintech,,,,,,,...,Monday,30,August,Weekday,16,Evening,True,False,True,66
0,if you are interested in learning how azure su...,U023R27V74N,https://info.microsoft.com/ww-landing-use-ai-t...,fintech,,,,,,,...,Monday,11,October,Weekday,22,Night,True,False,True,644
0,Fintech Junction Summer Event\nThere is a free...,U023R27V74N,,fintech,"[{'name': '+1', 'users': ['U024GNX9CGM', 'U024...","[{'id': 'F025DPWJ4J3', 'created': 1624378869, ...",,,,,...,Tuesday,22,June,Weekday,16,Evening,True,False,True,107
0,MoneyNext Open Banking Summit June 22-23\nIf y...,U023R27V74N,https://moneynext.tv/open-banking-summit/,fintech,"[{'name': '+1', 'users': ['U024SCVTY5T', 'U024...",,,,,,...,Friday,18,June,Weekday,19,Evening,True,False,True,159


### Feature Engineering and Another Cleaning

In [91]:
# Create another cleaning function to drop unnecessary columns, replace None values, and reorder columns
def clean_post_feateng(slack_df_clean):
    
    # Drop unnecessary columns
    slack_df_clean.drop(['reactions', 'reply_users', 'replies'], axis=1, inplace=True)
    
    # Replace None values with 0
    slack_df_clean['reply_count'] = slack_df_clean['reply_count'].fillna(0)
    slack_df_clean['reply_users_count'] = slack_df_clean['reply_users_count'].fillna(0)
    slack_df_clean['reply_count'] = slack_df_clean['reply_count'].astype(int)
    slack_df_clean['reply_users_count'] = slack_df_clean['reply_users_count'].astype(int)
    
    # Reorder columns
    slack_df_clean = slack_df_clean[['channel_name', 'user', #'real_name',
                     'text', 'text_length', 'reply_count', 'reply_users_count',
                     'replies_true', 'day_name', 'day_type', 'time',
                     'dayparts', 'day_number', 'month', 'reactions_count', 
                     'reactions_name', 'attachments', 'attachments_true', 'reaction_true']]
    
    return slack_df_clean

In [92]:
# function call
slack_df_clean = clean_post_feateng(slack_df_clean)

In [93]:
# Review the cleaned DataFrame post-feature engineering
slack_df_clean.head()

Unnamed: 0,channel_name,user,text,text_length,reply_count,reply_users_count,replies_true,day_name,day_type,time,dayparts,day_number,month,reactions_count,reactions_name,attachments,attachments_true,reaction_true
0,fintech,U023R27V74N,two upcoming conferences:\n• finovatefall sept...,415,0,0,False,Thursday,Weekday,1,Late Night,12,August,0,0,https://informaconnect.com/finovatefall/,True,True
1,fintech,U023R27V74N,also two free virtual expos sept 29-30:\n• blo...,168,0,0,False,Thursday,Weekday,1,Late Night,12,August,0,0,https://blockchain-expo.com/northamerica/,True,True
0,fintech,U023R27V74N,the columbia center of ai is having a symposiu...,170,0,0,False,Thursday,Weekday,18,Evening,30,September,1,muscle,https://www.eventbrite.com/e/cait-inaugural-sy...,True,True
0,fintech,U025DPVSGBT,"Y’all, free diy NFT class kicking off at 12pm ...",120,0,0,False,Sunday,Weekend,15,Afternoon,26,September,3,+1,https://buildspace.so/build-nfts,True,True
0,fintech,U024R294XHV,Anyone interested in next generation APIs here...,206,0,0,False,Wednesday,Weekday,12,Afternoon,6,October,2,fire,https://www.sigtech.com/platform/data,True,True


In [94]:
# Save to csv
slack_df_clean.to_csv('slack_cleaned.csv', index=False)