# Module 6: Custom Exploratory Data Analysis
**Author: Caleb Sellinger**
**Date: 21 February 2025**
### Purpose:
This project will explore affordable rental housing in the City of Chicago. We will go through the processes, methods, and techniques used to transform and manipulate data, as well as an analysis of the dataset.

## Section 1: Fetching Data
This section of code fetches the dataset from the URL specified. The URL is a download link from Data.gov of Chicago's affordable rental housing. Two functions used to retrieve the dataset:
1. get_csv: Retrieve .csv file from given URL and uses the below function.
2. write_csv: Writes to new file (Affordable_Rental_Housing_Developments.csv), and saves it to a named folder (datasets).

This module of code allows a reusable node for any available URL.

In [17]:
import pathlib
import requests
from utils_logger import logger

def get_csv(save_folder: str, filename: str, url: str) -> None:
    """
    Retrieve .csv file from given URL, write to new file, and save to named folder.

    Arguments:
    save_folder -- Name of folder to save to.
    filename -- Name of file to retrieve.
    url -- URL of .csv file to retrieve. Where to retrieve .csv file from.

    Returns: None
    """
    if not url:
        logger.error(
            "The URL provided is empty or does not exist. Please provide a valid URL."
        )
        return

    try:
        logger.info(f"Retrieving CSV file from {url}...")
        response = requests.get(url)
        response.raise_for_status()
        write_csv(save_folder, filename, response.text)
        logger.info(f"Successfully retrieved and saved file {filename}!")
    except requests.exceptions.HTTPError as http_err:
        logger.error(f"HTTP error: {http_err}")
    except requests.exceptions.RequestException as req_err:
        logger.error(f"Request error: {req_err}")


def write_csv(save_folder: str, filename: str, csv_data: str) -> None:
    """
    Write .csv file to new file and save to folder.

    Arguments:
    save_folder -- Name of folder to save to.
    filename -- Name of file to retrieve.
    csv_data -- .csv content as string.

    Returns: None
    """
    file_path = pathlib.Path(save_folder).joinpath(filename)

    try:
        logger.info(f"Writing data to file: {filename}...")
        file_path.parent.mkdir(parents=True, exist_ok=True)
        file = file_path.open("w")
        file.write(csv_data)
        file.close()
        logger.info(f"SUCCESS: data written to new file {filename}")
    except IOError:
        logger.error(f"Error writing to file: {IOError}")


def main():
    """
    Main function for running program
    """
    csv_url = "https://data.cityofchicago.org/api/views/s6ha-ppgi/rows.csv?accessType=DOWNLOAD"
    logger.info("Retrieving file...")
    get_csv("datasets", "Affordable_Rental_Housing_Developments.csv", csv_url)


if __name__ == "__main__":
    main()

[32m2025-02-16 16:18:39.524[0m | [1mINFO    [0m | [36m__main__[0m:[36mmain[0m:[36m63[0m - [1mRetrieving file...[0m
[32m2025-02-16 16:18:39.525[0m | [1mINFO    [0m | [36m__main__[0m:[36mget_csv[0m:[36m23[0m - [1mRetrieving CSV file from https://data.cityofchicago.org/api/views/s6ha-ppgi/rows.csv?accessType=DOWNLOAD...[0m
[32m2025-02-16 16:18:39.905[0m | [1mINFO    [0m | [36m__main__[0m:[36mwrite_csv[0m:[36m48[0m - [1mWriting data to file: Affordable_Rental_Housing_Developments.csv...[0m
[32m2025-02-16 16:18:39.907[0m | [1mINFO    [0m | [36m__main__[0m:[36mwrite_csv[0m:[36m53[0m - [1mSUCCESS: data written to new file Affordable_Rental_Housing_Developments.csv[0m
[32m2025-02-16 16:18:39.908[0m | [1mINFO    [0m | [36m__main__[0m:[36mget_csv[0m:[36m27[0m - [1mSuccessfully retrieved and saved file Affordable_Rental_Housing_Developments.csv![0m


This module will put the .csv into a pandas dataframe, so we can quickly see what the data looks like.

In [18]:
import pandas as pd

fetched_folder_name: str = "datasets"
data = "Affordable_Rental_Housing_Developments.csv"

def process(file_path: pathlib,folder: str):
    """
    Retrieves data from specified folder and puts .csv in pandas dataframe
    """
    input_file = pathlib.Path(
        folder, file_path
    )
    
    # was getting a utf-8 decode error, so I changed the encoding to latin-1
    df = pd.read_csv(input_file,encoding='latin-1')

    # print(df.head())
    df.info()

def main():
    """
    Main function for running program.
    """

    process(data,fetched_folder_name)

if __name__ == "__main__":
    main()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 598 entries, 0 to 597
Data columns (total 14 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Community Area Name    598 non-null    object 
 1   Community Area Number  598 non-null    int64  
 2   Property Type          598 non-null    object 
 3   Property Name          598 non-null    object 
 4   Address                598 non-null    object 
 5   Zip Code               598 non-null    int64  
 6   Phone Number           598 non-null    object 
 7   Management Company     598 non-null    object 
 8   Units                  597 non-null    float64
 9   X Coordinate           598 non-null    float64
 10  Y Coordinate           598 non-null    float64
 11  Latitude               598 non-null    float64
 12  Longitude              598 non-null    float64
 13  Location               589 non-null    object 
dtypes: float64(5), int64(2), object(7)
memory usage: 65.5+ KB


This module creates a database to store our data and creates a table to store in our database.

In [19]:
import sqlite3
import pathlib

# Define the database file in the current root project directory
db_file = pathlib.Path("db.sqlite3")


def create_database():
    """Function to create a database. Connecting for the first time
    will create a new database file if it doesn't exist yet."""
    try:
        conn = sqlite3.connect(db_file)
        conn.close()
        print("Database created successfully.")
    except sqlite3.Error as e:
        print("Error creating the database:", e)


# def create_tables():
#     """Function to read and execute SQL statements to drop existing table and create new ones."""
#     try:
#         with sqlite3.connect(db_file) as conn:
#             sql_file = pathlib.Path("sql", "create_tables.sql")
#             with open(sql_file, "r") as file:
#                 sql_script = file.read()
#             conn.executescript(sql_script)
#             print("Tables created successfully.")
#     except sqlite3.Error as e:
#         print("Error creating tables:", e)

def insert_data_from_csv():
    """Function to insert the records into their respective tables."""
    try:
        with sqlite3.connect(db_file) as conn:
            data_path = pathlib.Path("datasets", "Affordable_Rental_Housing_Developments.csv")
            df = pd.read_csv(data_path,encoding='latin-1')
            # use the pandas DataFrame to_sql() method to insert data
            # pass in the table name and the connection
            df.to_sql("data", conn, if_exists="replace", index=False)
            print("Data inserted successfully.")
    except (sqlite3.Error, pd.errors.EmptyDataError, FileNotFoundError) as e:
        print("Error inserting data:", e)

def main():
    create_database()
    # create_tables()
    insert_data_from_csv()


if __name__ == "__main__":
    main()

Database created successfully.
Data inserted successfully.


Here we can quickly see the top 5 rows of the table that was created running the code above.

In [20]:
pd.read_sql("SELECT * FROM data", con=sqlite3.connect(db_file)).head()

Unnamed: 0,Community Area Name,Community Area Number,Property Type,Property Name,Address,Zip Code,Phone Number,Management Company,Units,X Coordinate,Y Coordinate,Latitude,Longitude,Location
0,Avondale,21,Multifamily,Hairpin Lofts,3414 W. Diversey Ave.,60647,773-292-6360,Leasing & Management Co. Inc.,25.0,1153078.89,1918447.998,41.932073,-87.712872,
1,Loop,32,ARO,1000M,1000 S. Michigan Ave.,60605,312-820-1000,Willow Bridge,23.0,1177375.505,1895971.036,41.869878,-87.624269,
2,Logan Square,22,ARO,2556 Armtiage LLC,2556 W. Armitage Ave,60647,773-252-0600,North Clybourn Group,1.0,1158751.315,1913231.215,41.917643,-87.69217,"(41.917642826462, -87.6921699562562)"
3,Douglas,35,Multifamily,South Park Plaza,2600 S. King Dr.,60616,312-674-9210,Woodlawn Comm. Dev. Corp.,134.0,1179206.472,1887158.196,41.845653,-87.617816,"(41.8456529117633, -87.6178163910093)"
4,Near West Side,28,ARO,The Rosie,1461 S. Blue Island Ave.,60608,872-259-7452,The FLATS,7.0,1168331.384,1892984.019,41.861881,-87.657558,"(41.86188117554516, -87.65755843617394)"
