# permits-data / Clean Data

ETL pipeline for construction permits data in Los Angeles, California, USA.

For more information:
https://data.lacity.org/A-Prosperous-City/Building-and-Safety-Permit-Information/yv23-pmwf

## Setup

In [269]:
import os
import sys

# Set path for modules
sys.path[0] = '../'

from dotenv import load_dotenv, find_dotenv
import numpy as np
import pandas as pd

# SQL libraries
import psycopg2
print(psycopg2.__version__)

# Import custom eda and sql functions
from src.toolkits.eda import get_snapshot
from src.toolkits.sql import connect_db, get_table_names

# Import dependencies for geocoding
from geopy.geocoders import Nominatim
from geopy.geocoders import GoogleV3
from geopy.extra.rate_limiter import RateLimiter

2.8.5 (dt dec pq3 ext lo64)


In [274]:
# Set notebook display options
pd.set_option('display.max_rows', 2000)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

In [137]:
# Get project root directory
root_dir = os.path.dirname(os.getcwd())

# Set environment variables
load_dotenv(find_dotenv());
POSTGRES_USER = os.getenv("POSTGRES_USER")
POSTGRES_PASSWORD = os.getenv("POSTGRES_PASSWORD")
POSTGRES_DB = os.getenv("POSTGRES_DB")
DB_PORT = os.getenv("DB_PORT")
DB_HOST = os.getenv("DB_HOST")
DATA_URL = os.getenv("DATA_URL")

# Google Maps environment variables
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
GOOGLE_AGENT = "permits-data"

# Environment variables specific to notebook
DATA_DIR = os.path.dirname(root_dir) + '/data'
DB_TABLE = "permits_raw"

## 1. Clean Data

In [138]:
# Fetch data from postgres
def fetch_data(sql, con):
    
    # Fetch fresh data
    data = pd.read_sql_query(sql, con, coerce_float=False)

    # Replace None with np.nan
    data.fillna(np.nan, inplace=True)
    
    return data

In [139]:
# Connect to db
conn = connect_db()

# Extract partial dataset
sql = 'SELECT * FROM {} LIMIT 500;'.format(DB_TABLE)

# Columns to parse as dates
#date_columns = ['status_date', 'issue_date', 'license_expiration_date']

# Fetch data
data = fetch_data(sql, conn)

Connected as user "postgres" to database "permits" on http://localhost:5432.



### 1.1 Missing Data

#### Overview of Unique Values in Qualitative Data

Before making decisions about how to address missing values, it is important to be familiar with the content of each column. In some cases data can be left alone, imputed, recollected, or dropped from the dataset. Since the permits data has mostly qualitative data and unstructured text, most of it will be left alone.

In the case of geographic data such as addresses and lat/long coordinates, it will be necessary to accurately geocode the missing values. Since this information is split across several columns they will be concatenated into one column.

In [140]:
# Get an overview of data types, # unique values, # missing values and sample value
# for each column
get_snapshot(data)

Unnamed: 0_level_0,DATA TYPE,# UNIQUE VALUES,# MISSING VALUES,SAMPLE VALUE
COLUMN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
assessor_book,object,365,0,2688
assessor_page,object,44,0,029
assessor_parcel,object,75,0,036
tract,object,446,3,TR 3631
block,object,52,384,U
lot,object,155,4,121
reference_no_old_permit_no,object,165,304,15LA
pcis_permit_no,object,500,0,15041-90000-35045
status,object,8,0,Permit Finaled
status_date,object,423,0,10/29/2015


At the moment the only missing data of interest are *zip_code* and *latitude_longitude* coordinates, since these are necesary for mapping. 

### 1.2 Processing Missing Data

***Overview:***
* 1.2.1 - Combine address columns into one columns: *full_address*<br>
    - Correct *suffix_direction*
    - Convert *zip_code* to string
    - Concatenate to form *full_address*
* 1.2.2 - Geocode missing *latitude_longitude* with *full_address*<br>
* 1.2.3 - Split *latitude_longitude* into separate columns and convert to float values: *latitude*, *longitude*<br>
<br>
* Geocode missing *zip_code* with complete *latitude_longitude*<br>
* Geocode any missing *full_address* with *latitude_longitude*<br>

#### 1.2.1 Concatenate *full_address*

1) Correct values *suffix_direction*.<br>
2) Convert *zip_code* to string.<br>
3) Concatenate to form a complete street address string.

In [141]:
def create_column_full_address(data):

    # Truncate suffix_direction to first letter (N, S, E, W)
    data['suffix_direction'] = data['suffix_direction'].str[0].fillna('')

    # Convert zip_code to string
    data['zip_code'] = data['zip_code'].fillna('').astype(str)

    # Combine address columns to concatenate
    address_columns = ["address_start", "street_direction", "street_name", "street_suffix", "suffix_direction",
                      "zip_code"]

    # Concatenate address values
    data['full_address'] = data[address_columns].fillna('').astype(str).apply(' '.join, axis=1).str.replace('  ', ' ')

    # Replace empty strings with NaN values
    data[address_columns] = data[address_columns].replace('', np.nan)
    
    return data

In [142]:
data = create_column_full_address(data)

In [143]:
# Display
data[address_columns + ['full_address']].head()

Unnamed: 0,address_start,street_direction,street_name,street_suffix,suffix_direction,zip_code,full_address
0,1823,S,THAYER,AVE,,90025,1823 S THAYER AVE 90025
1,2122,W,54TH,ST,,90062,2122 W 54TH ST 90062
2,415,S,BURLINGTON,AVE,,90057,415 S BURLINGTON AVE 90057
3,315,S,OCEANO,DR,,90049,315 S OCEANO DR 90049
4,13640,W,PIERCE,ST,,91331,13640 W PIERCE ST 91331


#### 1.2.2 Geocode missing *latitude_longitude*

In [144]:
# Extract rows missing in latitude_longitude
data_missing = data[data['latitude_longitude'].isnull()==1]

# Size
data_missing.shape

(19, 60)

In [145]:
# Display
data_missing[['full_address', 'latitude_longitude']].head()

Unnamed: 0,full_address,latitude_longitude
5,7111 N MARISA RD 91405,
112,12453 W BROMWICH ST 91331,
146,9842 N LASSEN ROAD 91345,
160,101 S THE GROVE DR 90036,
170,1956 N CARMEN AVE 90068,


In [146]:
# Create helper function to geocode missing latitude_longitude values
def geocode(address, key, agent, timeout=None):
    
    """
    Uses GoogleMaps API to batch geocode address strings to lat/long coordinates. RateLimiter is to 
    avoid timeout errors. If an address cannot be geocoded it is left as NaN. Use of GoogleMaps 
    API incurs a charge at $0.005 per request.
    
    
    """
    
    if address:
        # Initializes GoogleMaps geocoder
        geolocator = GoogleV3(api_key=key, user_agent=agent, timeout=timeout)

        # Adds Rate Limiter to space out requests
        geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)

        # Geocode address input and format for dataframe
        location = geolocator.geocode(address)
        #print(address, location.latitude)
        
        latitude, longitude = location.latitude, location.longitude
        
        return latitude, longitude
    else:
        return np.nan

In [147]:
# Calculate cost
cost = len(data_missing) * 0.005
print("Cost for geocoding {} addresses is ${:.2f}.".format(len(data_missing), cost))

# Geocode missing coordinates using full addresses
if len(data_missing) > 0:
    data_missing['latitude_longitude'] = data_missing['full_address'].apply(geocode, args=(GOOGLE_API_KEY, 
                                                                                       "permits-data"))

# Update dataframe
data.update(data_missing)

Cost for geocoding 19 addresses is $0.10.


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [148]:
# Display
data_missing[['full_address', 'latitude_longitude']].head()

Unnamed: 0,full_address,latitude_longitude
5,7111 N MARISA RD 91405,"(34.2003503, -118.4533963)"
112,12453 W BROMWICH ST 91331,"(34.2538783, -118.40469)"
146,9842 N LASSEN ROAD 91345,"(34.2498959, -118.4665838)"
160,101 S THE GROVE DR 90036,"(34.072878, -118.357463)"
170,1956 N CARMEN AVE 90068,"(34.1068231, -118.3226816)"


#### 1.2.3 Split *latitude_longitude* 

Split coordinates into separate columns and convert to float values.

In [150]:
def split_column_lat_long(data):
    
    # Check that there are no more missing coordinates before proceeding
    assert data['latitude_longitude'].notnull().any(), "Missing coordinates must be geocoded."

    # Split latitude_longitude into separate columns and convert to float values: latitude, longitude
    if ['latitude', 'longitude'] not in data.columns.tolist():
        lat_long_series = data['latitude_longitude'].astype(str).str[1:-1].str.split(',', expand=True) \
                            .astype(float).rename(columns={0: "latitude", 1: "longitude"})

        # Add to original data
        return pd.concat([data, lat_long_series], axis=1)

In [152]:
data = split_column_lat_long(data)

In [298]:
# Display
data[['latitude_longitude', 'latitude', 'longitude']].head(1)

Unnamed: 0,latitude_longitude,latitude,longitude
0,"(34.05474, -118.42628)",34.05474,-118.42628


In [345]:
def save_csv(data, path, reorder=False):

    # Check unique columns
    assert data.columns.tolist() == data.columns.unique().tolist(), "Extra columns detected."
    
    # Check for null values
    assert data['latitude'].any(), 'Column "latitude" has missing values.'
    assert data['longitude'].any(), 'Column "longitude" has missing values.'

    # Check for erroneous coordinates. All coordinates should fall within Los Angeles county.
    assert (data['latitude'] > 33.2).all() and (data['latitude'] < 34.9).all(), "Incorrect latitude detected"
    assert (data['longitude'] > -118.9).all() and (data['longitude'] < -118).all(), "Incorrect longitude detected"

    if reorder:
        # Fetch names in postgres table and use to reorder columns dataframe
        columns_reordered = get_table_names(DB_TABLE, conn).tolist()
        data = data[columns_reordered]
        print("Columns are correctly reordered.")
    
    # Write to csv
    data.to_csv(path, index=False)
    
    return

In [163]:
# Path to csv
sql_path = root_dir + '/data/interim/permits_geocoded.csv'

# Reorder rows

# Save to interim folder
save_csv(data, sql_path)

In [164]:
data.shape

(500, 62)

## 2. Update PostgreSQL Database

In [338]:
# Connect to db
conn = connect_db()

# Extract partial dataset
sql = 'SELECT * FROM {} LIMIT 500;'.format(DB_TABLE)

# Columns to parse as dates
#date_columns = ['status_date', 'issue_date', 'license_expiration_date']

# Fetch data
data = pd.read_csv(root_dir + '/data/interim/permits_geocoded.csv')

Connected as user "postgres" to database "permits" on http://localhost:5432.



In [339]:
data.head()

Unnamed: 0,assessor_book,assessor_page,assessor_parcel,tract,block,lot,reference_no_old_permit_no,pcis_permit_no,status,status_date,permit_type,permit_sub_type,permit_category,project_number,event_code,initiating_office,issue_date,address_start,address_fraction_start,address_end,address_fraction_end,street_direction,street_name,street_suffix,suffix_direction,unit_range_start,unit_range_end,zip_code,work_description,valuation,floor_area_la_zoning_code_definition,no_of_residential_dwelling_units,no_of_accessory_dwelling_units,no_of_stories,contractors_business_name,contractor_address,contractor_city,contractor_state,license_type,license_no,principal_first_name,principal_middle_name,principal_last_name,license_expiration_date,applicant_first_name,applicant_last_name,applicant_business_name,applicant_address_1,applicant_address_2,applicant_address_3,zone,occupancy,floor_area_la_building_code_definition,census_tract,council_district,latitude_longitude,applicant_relationship,existing_code,proposed_code,full_address,latitude,longitude
0,4317,3,***,TR 30210-C,,LT 1,,15044-90000-08405,Permit Finaled,09/10/2015,HVAC,1 or 2 Family Dwelling,No Plan Check,,,INTERNET,08/18/2015,1823,1/2,1823,1/2,S,THAYER,AVE,,,,90025,,,,,,,CONDITIONED AIRE MECHANICAL & ENGINEERING INC,18650 PARTHENIA STREET,NORTHRIDGE,CA,C20,532440,BRETT,MOORE,HOFFER,06/30/2016,BRETT,HOFFER,,18650 PARTHENIA ST,,"NORTHRIDGE, CA",R3-1-O,,0.0,2671.0,5,"(34.05474, -118.42628)",Net Applicant,,,1823 S THAYER AVE 90025,34.05474,-118.42628
1,5005,10,017,CHESTERFIELD SQUARE,,465,16SL57806,16016-70000-02464,Permit Finaled,08/01/2017,Bldg-Alter/Repair,1 or 2 Family Dwelling,No Plan Check,,,SOUTH LA,02/04/2016,2122,,2122,,W,54TH,ST,,,,90062,General rehabilitation for single family dwell...,40000.0,,,,,OWNER-BUILDER,,,,,0,JAVIER,,TALAMANTES,,JAVIER,TALAMANTES,OWNER-BUILDER,,,,C2-1VL,,,2325.0,8,"(33.99307, -118.31668)",Owner-Bldr,1.0,,2122 W 54TH ST 90062,33.99307,-118.31668
2,5154,23,022,SUN-SET TRACT,D,13,14VN81535,14016-20000-13092,Issued,08/13/2014,Bldg-Alter/Repair,Apartment,Plan Check,,,VAN NUYS,08/13/2014,415,,415,,S,BURLINGTON,AVE,,1-30,1-30,90057,PHOTOVOLTAIC SOLAR PANELS ON ROOF OF (E) APT BLDG,37000.0,,,,,PERMACITY CONSTRUCTION CORP,5570 W WASHINGTON BLVD,LOS ANGELES,CA,B,827864,JONATHAN,SAUL,PORT,11/30/2015,LINDA,MARTON,,710 WILSHIRE BLVD,,"SANTA MONICA, CA",R4-1,,,2089.04,1,"(34.06012, -118.26997)",Agent for Owner,5.0,,415 S BURLINGTON AVE 90057,34.06012,-118.26997
3,4404,30,010,TR 12086,,2,,16044-30000-09658,Permit Finaled,08/29/2016,HVAC,1 or 2 Family Dwelling,No Plan Check,,,WEST LA,08/22/2016,315,,315,,S,OCEANO,DR,,,,90049,,,,,,,E/C HEATING AND AIR CONDITION,26888 CUATRO MILPAS ST,VALENCIA,CA,C20,651051,EDY,RUDOLFO,CORDON,07/31/2018,,,,,,,RS-1,,0.0,2640.0,11,"(34.05707, -118.4732)",Contractor,,,315 S OCEANO DR 90049,34.05707,-118.4732
4,2646,19,011,TR 7158,,11,,17042-90000-31792,Permit Finaled,12/28/2017,Plumbing,1 or 2 Family Dwelling,No Plan Check,,,INTERNET,12/26/2017,13640,,13640,,W,PIERCE,ST,,,,91331,,,,,,,TITANIUM POWER INC,1545 S LA CIENEGA BLVD,LOS ANGELES,CA,B,989217,DENNIS,HARUO,MIYAHIRA,12/31/2017,YONI,GHERMEZI,,1545 S LA CIENEGA BLVD,,"LOS ANGELES, CA",R1-1-O,,0.0,1044.03,7,"(34.25487, -118.43002)",Net Applicant,,,13640 W PIERCE ST 91331,34.25487,-118.43002


### 2.1 Add New Columns

A list of current columns is retrieved and compared to the new columns. Current columns will be updated with new columns if they are not already present. One important issue is that the order of columns in PostgreSQL is different than the order of columns in the Pandas dataframe. The solution is to ensure that they both have the same order so that values do not end up in the wrong column.

In [340]:
def add_columns(db_table, con, run=False):

    # Get names of current columns in PostgreSQL table
    current_names = get_table_names(db_table, con)

    # Get names of updated table not in current table
    updated_names = data.columns.tolist()
    new_names = list(set(updated_names) - set(current_names))
    
    # Check names list is not empty
    if not new_names:
        print("Table is up to date.")
        return

    # Format strings for query
    alter_table_sql = "ALTER TABLE {db_table}\n"
    add_column_sql = "\tADD COLUMN {column} TEXT,\n"

    # Create a list and append ADD column statements
    sql_query = [alter_table_sql.format(db_table=db_table)]
    for name in new_names:
        sql_query.append(add_column_sql.format(column=name))

    # Join into one string
    sql_query = ''.join(sql_query)[:-2] + ";"
    
    if run:
        ### ADD TRY/EXCEPT TO RUN QUERY AGAINST DB
        try:
            print("Connecting...")
            cur = con.cursor()
            print("Executing query...")
            cur.execute(sql_query)
            print("Committing changes...")
            con.commit()
            cur.close()
            print("Database updated successfully:\nAdd columns {}".format(', '.join(new_names)))
        except Exception as e:
            conn.rollback()
            print('Error:\n', e)
    
    return sql_query

In [341]:
print(alter_table_query)

ALTER TABLE permits_raw
	ADD COLUMN latitude TEXT,
	ADD COLUMN longitude TEXT,
	ADD COLUMN full_address TEXT;


In [343]:
alter_table_query = add_columns(DB_TABLE, conn, run=True)

Connecting...
Executing query...
Committing changes...
Database updated successfully:
Add columns latitude, longitude, full_address


In [346]:
# Path to csv
sql_path = root_dir + '/data/interim/permits_geocoded.csv'

# Resave with reordered columns
save_csv(data, sql_path, reorder=True)

Columns are correctly reordered.


### 2.2 Update Table

In [347]:
data_path = '/var/local/data/interim/permits_geocoded.csv' # Path within container for COPY command
sql_path = root_dir + '/postgres/sql/update_table_values.sql'
print(data_path)
print(sql_path)

/var/local/data/interim/permits_geocoded.csv
/Users/gregory/Documents/00 Data Projects/project-portfolio/permits-data/postgres/sql/update_table_values.sql


In [356]:
### ORIGINAL
# Builds a query to update from a csv file
def update_table_values(db_table, con, data_path, sql_path, run=False):

    # CREATE TABLE and COPY
    tmp_table = "tmp_" + db_table
    
    column_names = get_table_names(db_table, con)
    column_names = column_names.tolist()
    names = ',\n\t'.join(['{}'.format(name) + " TEXT" for name in column_names])

    create_tmp_table_sql = 'CREATE TABLE public.{tmp_table} (\n\t{names}\n);\n\n'.format(tmp_table=tmp_table, names=names)
    copy_from_table_sql = "COPY public.{tmp_table} FROM \'{data_path}\' (FORMAT csv, HEADER TRUE);\n\nUPDATE {db_table}\n".format(tmp_table=tmp_table, 
                                                                                                   data_path=data_path, db_table=db_table)

    # SET statements
    updates_sql = ["SET "]

    for name in column_names:
        original_name = '{}'.format(name)
        set_sql = "{name} = {tmp_name},\n\t".format(name=original_name, 
                                                   tmp_name=tmp_table + '.' + name)
        updates_sql.append(set_sql)

    updates_sql = ''.join(updates_sql)

    updates_sql = updates_sql[:-3] + "\n"

    # FROM and WHERE clause
    tail_sql = "FROM public.{tmp_table}\nWHERE {db_table}.pcis_permit_no = {tmp_table}.pcis_permit_no;\n" \
        .format(tmp_table=tmp_table, db_table=db_table)

    sql_query = create_tmp_table_sql + copy_from_table_sql + updates_sql + tail_sql
    
    if run:
        ### ADD TRY/EXCEPT TO RUN QUERY AGAINST DB
        try:
            cur = con.cursor()
            print("Executing...")
            cur.execute(sql_query)
            con.commit()
            cur.close()
            print('Table "{}" is updated.'.format(db_table))
        except Exception as e:
            con.rollback()
            print('Error:\n', e)
    
    return sql_query
        

In [351]:
update_table_values_query = update_table_values(DB_TABLE, con=conn, data_path=data_path, sql_path=sql_path, run=True)

Executing...
Connection closed.


In [352]:
print(update_table_values_query)

CREATE TABLE public.tmp_permits_raw (
	assessor_book TEXT,
	assessor_page TEXT,
	assessor_parcel TEXT,
	tract TEXT,
	block TEXT,
	lot TEXT,
	reference_no_old_permit_no TEXT,
	pcis_permit_no TEXT,
	status TEXT,
	status_date TEXT,
	permit_type TEXT,
	permit_sub_type TEXT,
	permit_category TEXT,
	project_number TEXT,
	event_code TEXT,
	initiating_office TEXT,
	issue_date TEXT,
	address_start TEXT,
	address_fraction_start TEXT,
	address_end TEXT,
	address_fraction_end TEXT,
	street_direction TEXT,
	street_name TEXT,
	street_suffix TEXT,
	suffix_direction TEXT,
	unit_range_start TEXT,
	unit_range_end TEXT,
	zip_code TEXT,
	work_description TEXT,
	valuation TEXT,
	floor_area_la_zoning_code_definition TEXT,
	no_of_residential_dwelling_units TEXT,
	no_of_accessory_dwelling_units TEXT,
	no_of_stories TEXT,
	contractors_business_name TEXT,
	contractor_address TEXT,
	contractor_city TEXT,
	contractor_state TEXT,
	license_type TEXT,
	license_no TEXT,
	principal_first_name TEXT,
	principal_middle_n

#### Check Update Success

In [354]:
# Connect to db
conn = connect_db()

# Extract partial dataset
sql = 'SELECT * FROM {} LIMIT 500;'.format(DB_TABLE)

# Columns to parse as dates
#date_columns = ['status_date', 'issue_date', 'license_expiration_date']

# Fetch data
data = fetch_data(sql, conn)

# Display
data.head(10)

Connected as user "postgres" to database "permits" on http://localhost:5432.



Unnamed: 0,assessor_book,assessor_page,assessor_parcel,tract,block,lot,reference_no_old_permit_no,pcis_permit_no,status,status_date,permit_type,permit_sub_type,permit_category,project_number,event_code,initiating_office,issue_date,address_start,address_fraction_start,address_end,address_fraction_end,street_direction,street_name,street_suffix,suffix_direction,unit_range_start,unit_range_end,zip_code,work_description,valuation,floor_area_la_zoning_code_definition,no_of_residential_dwelling_units,no_of_accessory_dwelling_units,no_of_stories,contractors_business_name,contractor_address,contractor_city,contractor_state,license_type,license_no,principal_first_name,principal_middle_name,principal_last_name,license_expiration_date,applicant_first_name,applicant_last_name,applicant_business_name,applicant_address_1,applicant_address_2,applicant_address_3,zone,occupancy,floor_area_la_building_code_definition,census_tract,council_district,latitude_longitude,applicant_relationship,existing_code,proposed_code,latitude,longitude,full_address
0,2027,4,011,TR 26618,,11,,17042-90000-21398,Issued,08/30/2017,Plumbing,1 or 2 Family Dwelling,No Plan Check,,,INTERNET,08/30/2017,23311,,23311,,W,WINDOM,ST,,,,91304,,,,,,,OWNER-BUILDER,,,,,0,,,,,KRISTEN,IMHOFF,,5048 CAMPO,,"WOODLAND HILLS, CA",RE11-1,,0.0,1344.22,12,"(34.20709, -118.63795)",Owner-Bldr,,,34.20709,-118.63795,23311 W WINDOM ST 91304
1,4317,3,***,TR 30210-C,,LT 1,,15044-90000-08405,Permit Finaled,09/10/2015,HVAC,1 or 2 Family Dwelling,No Plan Check,,,INTERNET,08/18/2015,1823,1/2,1823,1/2,S,THAYER,AVE,,,,90025,,,,,,,CONDITIONED AIRE MECHANICAL & ENGINEERING INC,18650 PARTHENIA STREET,NORTHRIDGE,CA,C20,532440,BRETT,MOORE,HOFFER,06/30/2016,BRETT,HOFFER,,18650 PARTHENIA ST,,"NORTHRIDGE, CA",R3-1-O,,0.0,2671.0,5,"(34.05474, -118.42628)",Net Applicant,,,34.05474,-118.42628,1823 S THAYER AVE 90025
2,5005,10,017,CHESTERFIELD SQUARE,,465,16SL57806,16016-70000-02464,Permit Finaled,08/01/2017,Bldg-Alter/Repair,1 or 2 Family Dwelling,No Plan Check,,,SOUTH LA,02/04/2016,2122,,2122,,W,54TH,ST,,,,90062,General rehabilitation for single family dwell...,40000.0,,,,,OWNER-BUILDER,,,,,0,JAVIER,,TALAMANTES,,JAVIER,TALAMANTES,OWNER-BUILDER,,,,C2-1VL,,,2325.0,8,"(33.99307, -118.31668)",Owner-Bldr,1.0,,33.99307,-118.31668,2122 W 54TH ST 90062
3,5154,23,022,SUN-SET TRACT,D,13,14VN81535,14016-20000-13092,Issued,08/13/2014,Bldg-Alter/Repair,Apartment,Plan Check,,,VAN NUYS,08/13/2014,415,,415,,S,BURLINGTON,AVE,,1-30,1-30,90057,PHOTOVOLTAIC SOLAR PANELS ON ROOF OF (E) APT BLDG,37000.0,,,,,PERMACITY CONSTRUCTION CORP,5570 W WASHINGTON BLVD,LOS ANGELES,CA,B,827864,JONATHAN,SAUL,PORT,11/30/2015,LINDA,MARTON,,710 WILSHIRE BLVD,,"SANTA MONICA, CA",R4-1,,,2089.04,1,"(34.06012, -118.26997)",Agent for Owner,5.0,,34.06012,-118.26997,415 S BURLINGTON AVE 90057
4,4404,30,010,TR 12086,,2,,16044-30000-09658,Permit Finaled,08/29/2016,HVAC,1 or 2 Family Dwelling,No Plan Check,,,WEST LA,08/22/2016,315,,315,,S,OCEANO,DR,,,,90049,,,,,,,E/C HEATING AND AIR CONDITION,26888 CUATRO MILPAS ST,VALENCIA,CA,C20,651051,EDY,RUDOLFO,CORDON,07/31/2018,,,,,,,RS-1,,0.0,2640.0,11,"(34.05707, -118.4732)",Contractor,,,34.05707,-118.4732,315 S OCEANO DR 90049
5,2646,19,011,TR 7158,,11,,17042-90000-31792,Permit Finaled,12/28/2017,Plumbing,1 or 2 Family Dwelling,No Plan Check,,,INTERNET,12/26/2017,13640,,13640,,W,PIERCE,ST,,,,91331,,,,,,,TITANIUM POWER INC,1545 S LA CIENEGA BLVD,LOS ANGELES,CA,B,989217,DENNIS,HARUO,MIYAHIRA,12/31/2017,YONI,GHERMEZI,,1545 S LA CIENEGA BLVD,,"LOS ANGELES, CA",R1-1-O,,0.0,1044.03,7,"(34.25487, -118.43002)",Net Applicant,,,34.254870000000004,-118.43002,13640 W PIERCE ST 91331
6,2219,27,052,TR 73820,,52,18VN77133,17010-20000-02747,CofO Issued,04/05/2019,Bldg-New,1 or 2 Family Dwelling,Plan Check,,,VAN NUYS,09/21/2018,7111,,7111,,N,MARISA,RD,,,,91405,"NEW SFD/GARAGE - PLAN 1A, LOT-52",196660.0,1560.0,1.0,,2.0,OWNER-BUILDER,,,,,0,,,,,DAVID,LELIE,,25152 SPRINGFIELD CT,#180,"VALENCIA, CA",(T)(Q)RD2-1,,1985.0,1278.03,6,"(34.2003503, -118.4533963)",Agent for Owner,,1.0,34.2003503,-118.4533963,7111 N MARISA RD 91405
7,2748,27,001,TR 22446,,298,14LA,14041-10000-02537,Permit Finaled,10/16/2014,Electrical,1 or 2 Family Dwelling,Plan Check,,,METRO,04/09/2014,9672,,9672,,N,LARAMIE,AVE,,PV1,,91311,"5.8 KW DC ROOF MOUNT PV, (24) MODULES, (1) TRA...",,,,,,SMART ENERGY SOLAR INC,1641 COMMERCE STREET,CORONA,CA,C46,990049,LEOBARDO,JOAQUIN,BAUTISTA,01/31/2016,,,,,,,RS-1,,0.0,1133.03,12,"(34.24591, -118.57345)",Contractor,,,34.245909999999995,-118.57345,9672 N LARAMIE AVE 91311
8,4234,9,009,TR 19428,,19,16WL75914,16016-30000-26238,Permit Finaled,06/27/2018,Bldg-Alter/Repair,Apartment,No Plan Check,,,WEST LA,11/01/2016,3755,,3755,,S,BUTLER,AVE,,1 & 7,,90066,KITCHEN AND BATHROOM REMODEL FOR RESIDENTIAL B...,12000.0,,,,,AMARO MARK,3230 MERRILL DRIVE UNIT 77,TORRANCE,CA,B,954226,MARK,,AMARO,11/30/2016,MARK,AMARO,,,,,R3-1,,,2719.01,11,"(34.01023, -118.42294)",Contractor,5.0,,34.01023,-118.42294,3755 S BUTLER AVE 90066
9,5154,13,011,NORTH KNOB HILL TRACT,,27,,18042-20000-12284,Permit Finaled,05/30/2018,Plumbing,Apartment,No Plan Check,,,VAN NUYS,05/21/2018,229,,233,,S,CRANDALL,ST,,,,90057,,,,,,,L G S COMPLIANCE ALLIANCE RETROFITTING,675 S GLENWOOD PLACE,BURBANK,CA,C36,900919,GLEN,ROY,CHRISTENSEN,07/31/2019,RAMON,,,,,,R3-1,,0.0,2085.02,13,"(34.06634, -118.27401)",Agent for Contractor,,,34.066340000000004,-118.27401,229 S CRANDALL ST 90057


In [355]:
data.shape

(500, 62)