## NLP Project

#### This part of the project is planned to have the following functions:
---
1. Load the csv files in the `daily_scrape_files` folder.
2. Create a dataframe with all the information
3. Either seperate out the details and description column, or load them separately.
4. Take details column and create yes/no columns for later machine learning.
5. Export new dataframe with columns to CSV / database
6. Take description column and run NLP on it to find possible correlation with description and rent or location
7. Export NLP results to another file for visualization / to a database (MongoDB here?)

#### Future Hopes and Wishes
---
1. Create classes to handle related functions
2. Separate classes and functions to new files if necessary
3. Use data config JSON to specify properties

***

## Imports

In [2]:
import numpy as np
import pandas as pd
from glob import glob

## Load files from scraped folder into dataframe

In [5]:
csv_files = glob('daily_scrape_files/*.csv')
sorted(csv_files)




['daily_scrape_files/apartments_for_rent_AustinTX_2020-01-29.csv',
 'daily_scrape_files/apartments_for_rent_AustinTX_2020-01-30.csv',
 'daily_scrape_files/apartments_for_rent_AustinTX_2020-01-31.csv',
 'daily_scrape_files/apartments_for_rent_AustinTX_2020-02-03.csv',
 'daily_scrape_files/apartments_for_rent_AustinTX_2020-02-04.csv',
 'daily_scrape_files/apartments_for_rent_AustinTX_2020-02-05.csv',
 'daily_scrape_files/apartments_for_rent_AustinTX_2020-02-06.csv']

In [20]:
df = pd.DataFrame()

In [21]:
for csv_file in sorted(csv_files):
    data = pd.read_csv(csv_file)
    df = pd.concat([df, data], ignore_index=True)

In [22]:
df.sample(10)

Unnamed: 0,name,address,unit,sqft,bed,bath,price,city,state,zipcode,description,details,url,date
8493,Logans Mill Apartments,1912 E William Cannon Dr,Unit 1110,648,1.0,1.0,950,Austin,TX,78744,South Austin Living at its Finest! If location...,"Apartment ,Cats, small dogs allowed ,Built in ...",https://www.trulia.com/c/tx/austin/logans-mill...,2020-01-31
3806,"1 bed, 550 sqft, $975",926 E Dean Keeton St,1 Bed 1.0 Bath,550,1.0,1.0,975,Austin,TX,78705,North Campus One Bedroom - Tower View PRE-LEAS...,"Apartment ,Cats, small dogs allowed ,Built in ...",https://www.trulia.com/c/tx/austin/1-bed-550-s...,2020-01-29
8447,The Village At Gracy Farms Apartments,2600 Gracy Farms Ln,Unit 937,933,2.0,2.0,1440,Austin,TX,78758,Welcome Home to The Village at Gracy Farms! Th...,"Apartment ,Cats, small dogs allowed ,Built in ...",https://www.trulia.com/c/tx/austin/the-village...,2020-01-31
29087,Fivetwo At Highland Apartments,110 Jacob Fontaine,Unit 453,479,0.0,1.0,1185,Austin,TX,78752,"HI DESIGN, HI ENERGY, HIGHLAND. Designed with ...","Apartment ,Cats, small dogs allowed ,Built in ...",https://www.trulia.com/c/tx/austin/fivetwo-at-...,2020-02-06
22581,Lenox Ridge Apartments,3001 Scofield Ridge Pkwy,Unit 1411,708,1.0,1.0,1495,Austin,TX,78727,One visit to Lenox Ridge in dynamic North Aust...,"Apartment ,Cats, small dogs, large dogs allowe...",https://www.trulia.com/c/tx/austin/lenox-ridge...,2020-02-05
19669,Crescent Apartments,127 E Riverside Dr,Unit 327,392,0.0,1.0,1349,Austin,TX,78704,"Start your new chapter at Crescent Apartments,...","Apartment ,Cats, small dogs, large dogs allowe...",https://www.trulia.com/c/tx/austin/crescent-12...,2020-02-04
10411,Citadel at Tech Ridge Apartments,1127 Pearl Retreat Ln,Unit 1306,787,1.0,1.0,1335,Austin,TX,78753,Smart Architecture. Inspired Interiors. Best-o...,"Apartment ,Cats, small dogs allowed ,Kitchen I...",https://www.trulia.com/c/tx/austin/citadel-at-...,2020-01-31
20056,The Elizabeth at Presidio Apartments,13500 Lyndhurst St,Unit 1088,886,1.0,1.0,1807,Austin,TX,78717,"Experience ""True Texas Luxury"" at the newest c...","Apartment ,Cats, small dogs, large dogs allowe...",https://www.trulia.com/c/tx/austin/the-elizabe...,2020-02-04
20329,Aura Riverside Apartments,6107 E Riverside Dr,Unit 189,803,1.0,1.0,1680,Austin,TX,78741,"Located in Austin's eclectic Eastside, Aura Ri...","Apartment ,Cats, small dogs, large dogs allowe...",https://www.trulia.com/c/tx/austin/aura-rivers...,2020-02-04
22357,Marquis Parkside Apartments,12820 N Lamar Blvd,Unit 225,636,1.0,1.0,1086,Austin,TX,78753,Experience luxury living in North Austin. Marq...,"Apartment ,Cats, small dogs, large dogs allowe...",https://www.trulia.com/c/tx/austin/marquis-par...,2020-02-05


## Details Column

In [75]:
for detail in df.details.sample(5):
    print(detail)
    print('')

apartment ,, ,  allowed ,built in 1966 ,rent includes: sewage, garbage ,parking: off street ,bicycle storage ,convection oven ,laundry: shared ,online rent payment ,parking lot ,online maintenance portal ,ceiling fan ,dishwasher ,disposal ,garden ,lawn ,refrigerator ,additional storage ,floors: tile ,heating fuel: electric

apartment ,, ,  allowed ,trail ,direct access and secure controlled entry ,lounge ,pet friendly ,modern lighting in kitchen, dining and bath areas ,kitchen islands with modern pendant lighting ,relaxing bath with oval soaking tub ,pet washing station ,parking lot ,large  closets with wood shelving , ,bicycle storage

apartment ,, ,  allowed ,built in 2019 ,: $350 ,mud room style entries* ,oversized chess board and social games ,lounge ,bluetooth thermostats , countertops ,standup showers* ,resident arcade with classic arcade games and ping pong ,resident co-working space with private conference rooms ,laundry: shared ,wellness studio with free group fitness classes 

In [76]:
list_of_details = ['cats', 'small dogs', 'large dogs', 'game room', 'ev charging', 'granite', 'gourmet', 'open living',
                  'walk-in', 'stainless', 'balcony' 'fireplace', 'pool', 'elevator', 'deposit', 'pet park', 'fitness center',
                  'club house', 'dishwasher', 'disposal', 'hot tub', 'spa']

In [41]:
df['cats'] = df['details'].apply(lambda detail: 1 if 'cats' in detail.lower() else 0)

In [43]:
df['small_dogs'] = df['details'].apply(lambda detail: 1 if 'small dogs' in detail.lower() else 0)

In [45]:
df['large_dogs'] = df['details'].apply(lambda detail: 1 if 'large dogs' in detail.lower() else 0)

In [77]:
for detail in list_of_details:
    detail_col_name = detail.replace(' ', '_').replace('-', '_')
    df[detail_col_name] = df['details'].apply(lambda x: 1 if detail in x.lower() else 0)
    df['details'] = df['details'].apply(lambda x: x.lower().replace(detail,''))

In [78]:
df.sample(10)

Unnamed: 0,name,address,unit,sqft,bed,bath,price,city,state,zipcode,...,elevator,deposit,pet_park,fitness_center,club_house,walk_in,dishwasher,disposal,hot_tub,spa
2531,Marquis at Caprock Canyon Apartments,4411 Spicewood Springs Rd,Unit 2701,1071,2.0,2.0,1714,Austin,TX,78731,...,0,0,0,0,0,0,0,0,0,0
11152,Seven Apartments,615 W 7th St,The Tower - Loft,1061,1.0,1.5,3823,Austin,TX,78701,...,0,0,0,0,0,0,0,0,0,0
16354,Fivetwo At Highland Apartments,110 Jacob Fontaine,Unit 263,695,1.0,1.0,1440,Austin,TX,78752,...,0,0,0,0,0,0,0,0,0,1
11355,The Elizabeth at Presidio Apartments,13500 Lyndhurst St,Unit 2080,1258,2.0,2.0,2205,Austin,TX,78717,...,0,0,0,0,0,0,1,0,0,1
19438,Gables Republic Square Apartments,401 Guadalupe St,Unit 1809,769,1.0,1.0,2800,Austin,TX,78701,...,0,0,0,0,0,0,0,0,0,0
17591,Residences at Saltillo,1211 E 5th St,Unit 1-1215,702,1.0,1.0,1898,Austin,TX,78702,...,0,0,0,0,0,0,0,0,1,1
18334,The Davis SoCo Apartments,3809 S Congress Ave,The Bergstrom,1353,2.0,2.0,2209,Austin,TX,78704,...,0,0,0,0,0,0,0,0,0,1
24609,Fivetwo At Highland Apartments,110 Jacob Fontaine,Unit 424,1067,2.0,2.0,2270,Austin,TX,78752,...,0,0,0,0,0,0,0,0,0,1
12080,Residences at Saltillo,1211 E 5th St,Unit 2-1336,1056,1.0,1.0,2489,Austin,TX,78702,...,0,0,0,0,0,0,0,0,1,1
2066,Overture Arboretum 60+ Apartment Homes,10600 Jollyville Rd,Unit 117,1051,2.0,2.0,3200,Austin,TX,78759,...,0,0,0,0,0,0,1,0,1,1
