# Apartment Rental Classified Ads

### Explore data about Classified Ads for apartment rentals in the US

**Data:** [UCI ML Repository - Apartment for rent classified](https://archive.ics.uci.edu/dataset/555/apartment+for+rent+classified)

**Description:**

- id = unique identifier of apartment
- category = category of classified
- title = title text of apartment
- body = body text of apartment
- amenities = like AC, basketball,cable, gym, internet access, pool, refrigerator etc.
- bathrooms = number of bathrooms
- bedrooms = number of bedrooms
- currency = price in current
- fee = fee
- has_photo = photo of apartment
- pets_allowed = what pets are allowed dogs/cats etc.
- price = rental price of apartment
- price_display = price converted into display for reader
- price_type = price in USD
- square_feet = size of the apartment
- address =  where the apartment is located
- cityname =  where the apartment is located
- state =  where the apartment is located
- latitude = where the apartment is located
- longitude = where the apartment is located
- source = origin of classified
- time = when classified was created

In [12]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import os



## Load the Data

In [13]:
# load the csv file into a dataframe
raw_df = pd.read_csv('../data/raw/apartments_for_rent_classified_10k.csv', sep=';', encoding='cp1252')

In [14]:
# view the dataframe info
raw_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 22 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   id             10000 non-null  int64  
 1   category       10000 non-null  object 
 2   title          10000 non-null  object 
 3   body           10000 non-null  object 
 4   amenities      6451 non-null   object 
 5   bathrooms      9966 non-null   float64
 6   bedrooms       9993 non-null   float64
 7   currency       10000 non-null  object 
 8   fee            10000 non-null  object 
 9   has_photo      10000 non-null  object 
 10  pets_allowed   5837 non-null   object 
 11  price          10000 non-null  int64  
 12  price_display  10000 non-null  object 
 13  price_type     10000 non-null  object 
 14  square_feet    10000 non-null  int64  
 15  address        6673 non-null   object 
 16  cityname       9923 non-null   object 
 17  state          9923 non-null   object 
 18  latitud

### Analysis of Data Quality
There are some missing values and the largest amount of missing values are in: amenities, pets_allowed, address

- Pets Allowed may be null because that is the way to signify that pets are not allowed.
- Amenities may be null because that is the way a building signifies it does not have special amenities.
- Address does not seem necessary since there is latitude and longitude values, which are more machine readable.

## Explore the High % of Missing Value Columns

In [None]:
raw_df