## Part 2: Data Wraggling

I have gotten response from 4 resources: Craigslist, Nextdoor, Facebook, and sublet.com. The communication format for each resource are:
* Craigslist: email and text message
    * Format: mbox for email, excel for text messages
* Nextdoor: Nextdoor inbox
    * Format: excel
* Facebook: Facebook messages
    * Format: JSON file for each conversation
* sublet.com: sublet.com inbox
    * Format: excel

Here I will import all data, making sure their column titles and data formats are the same. 
And then I will concatenate all data from different sources into one master dataframe for future analysis and visualization. 

###  Section 1

In [1]:
import pandas as pd
import numpy as np
from datetime import datetime
from datetime import date

In [2]:
# Get all the previously prepared data:
facebook = pd.read_csv('E:\Data_Science_Coursework\Rental Data Analysis\ facebook.csv')
facebook_post2 = pd.read_csv('E:\Data_Science_Coursework\Rental Data Analysis\ facebook_post.csv')
email = pd.read_csv('E:\Data_Science_Coursework\Rental Data Analysis\email.csv')

Making sure the column headers are uniform. They will be 'Name', 'Content', 'Visit', 'Source', 'Date', 'Time', 'Weekday'.

In [3]:
# Take a look at the top 5 messages from the facebook.
facebook.head()

Unnamed: 0.1,Unnamed: 0,Name,Content,Visit,Source,Date,Time,Weekday
0,0,Artavia Price,I understand. Thanks anyway July would be the...,False,Facebook,2020-05-12,07:02:42.802000,1.0
1,1,Ashley Johnson,Yes. Please contact 832-985-4791 Is this stil...,False,Facebook,2020-05-18,09:53:18.357000,0.0
2,2,Ashley Ann Caldwell,7153935574 is me just in case Sounds good Tom...,True,Facebook,2020-05-10,15:15:27.126000,6.0
3,3,Ashley Renee,Thank you That's crazy but i get it But if yo...,False,Facebook,2020-05-17,16:47:17.766000,6.0
4,4,Eden Ciarra,Only a year. Iâm moving from out of state f...,False,Facebook,2020-05-17,10:07:41.969000,6.0


In [4]:
# Take a look at the top 5 histories of facebook post creation and updates. 
facebook_post2.rename(columns = {"Type": "Source"}, inplace = True)
facebook_post2.loc[facebook_post2['Source'] == 'Creation', 'Source'] = 'Facebook_Creation'
facebook_post2.loc[facebook_post2['Source'] == 'Update', 'Source'] = 'Facebook_Update'
facebook_post2.head()

Unnamed: 0.1,Unnamed: 0,Date,Time,Source,Weekday
0,0,2020-05-16,13:54:11,Facebook_Creation,5.0
1,1,2020-05-16,13:53:58,Facebook_Creation,5.0
2,2,2020-05-09,14:22:18,Facebook_Creation,5.0
3,3,2020-05-09,08:54:24,Facebook_Creation,5.0
4,4,2020-04-18,09:24:20,Facebook_Creation,5.0


In [5]:
# To unify the entries, all weekdays in strings are transferred into numbers. 
email.loc[email['Weekday'] == 'Mon', 'Weekday'] = 0.0
email.loc[email['Weekday'] == 'Tue', 'Weekday'] = 1.0
email.loc[email['Weekday'] == 'Wed', 'Weekday'] = 2.0
email.loc[email['Weekday'] == 'Thu', 'Weekday'] = 3.0
email.loc[email['Weekday'] == 'Fri', 'Weekday'] = 4.0
email.loc[email['Weekday'] == 'Sat', 'Weekday'] = 5.0
email.loc[email['Weekday'] == 'Sun', 'Weekday'] = 6.0

email.head()

Unnamed: 0.1,Unnamed: 0,Name,Weekday,Date,Time,Source,Content,Visit
0,0,"['craigslist - automated message, do not reply']",1,2020-03-24,17:20:34,Craigslist_Update,b'\nIMPORTANT - FURTHER ACTION IS REQUIRED TO ...,False
1,1,['Aarice Dumas'],5,2020-05-09,16:42:16,Craigslist,b'https://houston.craigslist.org/sub/d/houston...,False
2,2,['Jessie Williamson'],5,2020-04-18,18:08:37,Craigslist,"b'\nhi, i have a 6 pound chiweenie. i need him...",False
3,3,['Zoe Nanson'],4,2020-05-15,21:17:36,Craigslist,"b'Hello! My name is Zoe, and my roommate and I...",False
4,4,"['craigslist - automated message, do not reply']",6,2020-01-26,03:59:05,Craigslist_Update,b'\nIMPORTANT - FURTHER ACTION IS REQUIRED TO ...,False


In [6]:
# now import inquiry history from nextdoor.com inbox, sublet.com inbox , and text messages into pandas dataframes. 
nextdoor = pd.read_excel(r'E:\\Data_Science_Coursework\\Rental Data Analysis\Nextdoor.xlsx')
sublet_com = pd.read_excel(r'E:\Data_Science_Coursework\Rental Data Analysis\sublet_com.xlsx')
text = pd.read_excel(r'E:\Data_Science_Coursework\Rental Data Analysis\Text.xlsx')

In [7]:
# Again, making sure all entries are unified for later concanation. Nobody from Nextdoor.com ended up visiting. 
nextdoor.rename(columns = {"Text": "Content"}, inplace = True)
nextdoor['Visit'] = 'False'
nextdoor['Source'] = 'Nextdoor'
for i in range(len(nextdoor['Date'])):
    nextdoor.loc[i, 'Weekday'] = nextdoor.loc[i, 'Date'].weekday()
    
nextdoor.head()

Unnamed: 0,Name,Date,Time,Content,Visit,Source,Weekday
0,Vaitiare,2020-05-20,,"Hello! Nice apartament i’am interested, can yo...",False,Nextdoor,2.0
1,LaQuita,2020-05-14,,Good afternoon. I am interested in renting thi...,False,Nextdoor,3.0
2,Liz Figueroa,2020-05-11,,Is this apartment still available?,False,Nextdoor,0.0


In [8]:
# Message data from sublet.com. 
sublet_com.rename(columns = {"content": "Content"}, inplace = True)
sublet_com['Visit'] = 'False'
sublet_com['Source'] = 'sublet.com'

for i in range(len(sublet_com['Date'])):
    sublet_com.loc[i, 'Weekday'] = sublet_com.loc[i, 'Date'].weekday()
    
sublet_com.head()

Unnamed: 0,Name,Date,Content,Visit,Source,Weekday
0,Crystal,2020-03-17,Where is the apt located? Is a sublet through ...,False,sublet.com,1.0
1,Zarmeena,2020-05-05,"Hello Cecilia, Im looking to move into a new a...",False,sublet.com,1.0
2,Chasity,2020-05-05,Hi yes Im very interested. Thank you so much f...,False,sublet.com,1.0
3,Curley,2020-05-10,Is this apartment still available for sub lease?,False,sublet.com,6.0
4,Sydney,2020-05-13,Hi can you please text me more info?,False,sublet.com,2.0


In [9]:
# Message data from texts. Some are from Facebook, and some are from Craigslist. 
text.rename(columns= {'keyword':'Content','visiting':'Visit'}, inplace = True)

for i in range(len(text['Date'])):
    text.loc[i, 'Weekday'] = text.loc[i, 'Date'].weekday()

text.head()

Unnamed: 0,Name,Date,Time,Source,Content,Visit,Weekday
0,Frank,2020-05-18,10:37:00,Facebook,,True,0.0
1,Old lady,2020-05-19,16:40:00,Craigslist,,False,1.0
2,Daniel,2020-05-18,19:05:00,Craigslist,,False,0.0
3,student,2020-05-18,22:32:00,Craigslist,,False,0.0
4,Gage,2020-05-16,23:53:00,Craigslist,,False,5.0


Now I need to concate all dataframe together. 

In [10]:
# Concatenate into one dataframe called master. 
master = pd.concat([facebook,facebook_post2, email, nextdoor, sublet_com, text], ignore_index=True, sort=False)

In [11]:
# Further clean the data
master.drop(columns = 'Unnamed: 0', inplace = True)
master.dropna(subset = ['Date'], axis = 0, inplace = True)
master.reset_index(drop = True, inplace = True)

In [12]:
master.to_csv(r'E:\Data_Science_Coursework\Rental Data Analysis\Master.csv')

In [13]:
# take a look at a random data.
master.iloc[24]

Name                     NaN
Content                  NaN
Visit                    NaN
Source     Facebook_Creation
Date              2020-05-16
Time                13:54:11
Weekday                    5
Name: 24, dtype: object

In [14]:
master['Source'].value_counts()

Facebook             26
Craigslist           15
Craigslist_Update     8
sublet.com            6
Facebook_Creation     5
Nextdoor              5
Facebook_Update       1
Name: Source, dtype: int64

I got 26 interested people from the Facebook, 15 from the Craigslist, 6 from the sublet.com and 5 from the nextdoor.com, making a total of 52 inquiries. During the period, I have updated my Craigslist postings 8 times, created 5 Facebook posts and review one the posts once. 

In [15]:
master.loc[24,'Source']

'Facebook_Creation'

In [16]:
master.head(30)

Unnamed: 0,Name,Content,Visit,Source,Date,Time,Weekday
0,Artavia Price,I understand. Thanks anyway July would be the...,False,Facebook,2020-05-12,07:02:42.802000,1
1,Ashley Johnson,Yes. Please contact 832-985-4791 Is this stil...,False,Facebook,2020-05-18,09:53:18.357000,0
2,Ashley Ann Caldwell,7153935574 is me just in case Sounds good Tom...,True,Facebook,2020-05-10,15:15:27.126000,6
3,Ashley Renee,Thank you That's crazy but i get it But if yo...,False,Facebook,2020-05-17,16:47:17.766000,6
4,Eden Ciarra,Only a year. Iâm moving from out of state f...,False,Facebook,2020-05-17,10:07:41.969000,6
5,Ericah Cardenas,A year Ericah is waiting for your response. I...,False,Facebook,2020-05-13,01:11:30.078000,2
6,Indiana Edwards,Is this still available? Indiana changed the ...,False,Facebook,2020-05-16,08:26:56.343000,5
7,Shalmer Perro,Ok my number is +18326829569 Okay tomorrow wo...,True,Facebook,2020-05-15,17:05:25.006000,4
8,Isabel Vargas,All bills included? Isabel changed the group ...,False,Facebook,2020-05-14,20:06:20.453000,3
9,Quenique Jasmine,Jasmine is waiting for your response. Hi are ...,False,Facebook,2020-05-11,15:50:42.294000,0


We see that we get the most response from Facebook with a total of 26 response. Craigslist is also very effective with 15 responses.

In [17]:
for i in range(len(master['Source'])):
    if 'Creation' in str(master.loc[i,'Source']) or 'Update' in str(master.loc[i,'Source']):
        master.loc[i,'Visit'] = 'NA'
    if 'False' in str(master.loc[i,'Visit']):
        master.loc[i,'Visit'] = False

master['Visit'].value_counts()

False    43
NA       14
True      9
Name: Visit, dtype: int64