## 🏙️ City-Owned Land Data Cleaning (Chicago Data Portal)

This notebook processes the “City-Owned Land Inventory” dataset downloaded from the [Chicago Data Portal](https://data.cityofchicago.org/). The dataset contains details about properties owned by the City of Chicago, including their location, sales status, and property condition.

### Objective:
To identify and clean city-owned parcels by:
- Filtering only properties that are **owned by the city**
- Cleaning and standardizing text fields (e.g., removing extra spaces)
- Excluding properties that are **not available for acquisition**, such as those with sales statuses like `"Application Closed"` or `"Offered"`

### Steps:
1. Load the raw dataset and inspect columns and value distributions.
2. Filter for `"Owned by City"` property status (with typo handling).
3. Remove parcels not currently available for sale.
4. Output the cleaned dataset for further urban planning and analysis.

✅ The resulting dataset provides a clean list of city-owned land parcels that are potentially available for reuse or development.

### City Owned Land Filtered

In [2]:
import pandas as pd
# Load the CSV file downloded from Chicago data portal
file_path1 = "C:/Users/kaur6/Downloads/Urban Analytics/City-Owned_Land_Inventory.csv"
df1 = pd.read_csv(file_path1)

# Display columns and number of rows
print("Columns in the dataset:", df1.columns.tolist())
print("Number of rows:", df1.shape[0])

Columns in the dataset: ['ID', 'PIN', 'Address', 'Managing Organization', 'Property Status', 'Date of Acquisition', 'Date of Disposition', 'Sales Status', 'Sale Offering Status', 'Sale Offering Reason', 'Sq. Ft.', 'Square Footage - City Estimate', 'Land Value (2022)', 'Ward', 'Community Area Number', 'Community Area Name', 'Zoning Classification', 'Zip Code', 'Last Update', 'Application Use', 'Grouped Parcels', 'Application Deadline', 'Offer Round', 'Application URL', 'X Coordinate', 'Y Coordinate', 'Latitude', 'Longitude', 'Location']
Number of rows: 20622


In [3]:
# Print unique values for 'Sales Status' and 'Sale Offering Status'
print("Unique values in 'Property Status':", df1['Property Status'].unique())

Unique values in 'Property Status': ['Owned by City' 'Sold' 'Sold By City' 'Leased' 'Not City Owned'
 'Sold by City' 'Ownd by City' nan 'In Acquisition']


In [4]:
# Standardize the 'Property Status' column
df1['Property Status'] = df1['Property Status'].str.strip()  # Remove leading/trailing spaces

# Filter rows where 'Property Status' is 'Owned by City' or its misspelled version
filtered_df = df1[df1['Property Status'].isin(['Owned by City', 'Ownd by City'])].copy()

# Correct the misspelled value
filtered_df['Property Status'] = 'Owned by City'

# Save to a new CSV file
output_path = "C:/Users/kaur6/Downloads/Urban Analytics/Owned_By_City_Properties.csv"
filtered_df.to_csv(output_path, index=False)

print(f"Filtered data saved to: {output_path}")

Filtered data saved to: C:/Users/kaur6/Downloads/Urban Analytics/Owned_By_City_Properties.csv


In [5]:
print("Number of rows:", filtered_df.shape[0])

Number of rows: 12416


In [6]:
file = pd.read_csv("C:/Users/kaur6/Downloads/Urban Analytics/Owned_By_City_Properties.csv")
# Count null values for each column
null_counts = file.isnull().sum()
# Print null value counts
print("Null value count for each column:\n", null_counts)

Null value count for each column:
 ID                                    0
PIN                                   0
Address                             714
Managing Organization              7163
Property Status                       0
Date of Acquisition                4546
Date of Disposition               12411
Sales Status                       3561
Sale Offering Status              11674
Sale Offering Reason              11673
Sq. Ft.                             381
Square Footage - City Estimate     8237
Land Value (2022)                  8229
Ward                                831
Community Area Number               831
Community Area Name                 834
Zoning Classification               834
Zip Code                            384
Last Update                           0
Application Use                   11600
Grouped Parcels                   11938
Application Deadline              11331
Offer Round                       11324
Application URL                   11444
X Coo

In [8]:
# Print unique values for 'Sales Status' and 'Sale Offering Status'
print("Unique values in 'Sales Status':", df1['Sales Status'].unique())

Unique values in 'Sales Status': [nan 'Interest' 'Application(s) Received' 'Offered' 'Application Closed'
 'Not offered' 'Application Received' 'See note' 'Partially verified'
 'Verified' 'Apply']


In [9]:
# Count rows where 'Sales Status' is 'Application Closed'
application_closed_count = (file['Sales Status'] == 'Application Closed').sum()
# Print the count
print("Number of rows where 'Sales Status' is 'Application Closed':", application_closed_count)
# Count rows where 'Sales Status' is 'Offered'
offered_count = (file['Sales Status'] == 'Offered').sum()
# Print the count
print("Number of rows where 'Sales Status' is 'Offered':", offered_count)

Number of rows where 'Sales Status' is 'Application Closed': 515
Number of rows where 'Sales Status' is 'Offered': 985


In [10]:
# Filter out rows where 'Sales Status' is 'Application Closed' or 'Offered'
df_cleaned = file[~file['Sales Status'].isin(['Application Closed', 'Offered'])]

# Save the cleaned data to a new CSV file
output_path = "C:/Users/kaur6/Downloads/Urban Analytics/Cleaned_City_Owned_Land.csv"
df_cleaned.to_csv(output_path, index=False)

print(f"Rows with 'Application Closed' and 'Offered' removed. Cleaned data saved to: {output_path}")

Rows with 'Application Closed' and 'Offered' removed. Cleaned data saved to: C:/Users/kaur6/Downloads/Urban Analytics/Cleaned_City_Owned_Land.csv


In [11]:
print("Number of rows:", df_cleaned.shape[0])

Number of rows: 10916
