# Project 1: How Public Wi-Fi Is Distributed Across New York City

**Why this topic**  
Public Wi-Fi sounds mundane, but it is part of digital inclusion. When home internet is unaffordable or unstable, free hotspots at libraries, parks, and sidewalks help with school, job applications, telehealth, and city services. I wanted a first-pass view of **where** these hotspots are concentrated across the five boroughs. Not a fancy map. Just a clear, reproducible profile that anyone can run.

**My goal**  
My goal with this Project was to build a short, readable snapshot of NYC’s public Wi-Fi so that anyone can see which boroughs have more hotspots at a glance, using basic summary stats and a tiny text chart.

I did that by:

1. Loading the NYC Public Wi-Fi Locations dataset and using the `Latitude` column as my numeric example, then calculating the mean, median, and mode with pandas to make sure the data loaded cleanly and sits where NYC should be.

2. Repeating those same three numbers (mean, median, mode) using only the Python standard library to show I can get the same result even without any data libraries.

3. Counting how many Wi-Fi hotspots are in each borough and printing a simple text chart made of `+` signs (with a legend like “1 `+` = 50 hotspots”) so you can quickly see which boroughs have more coverage.


## Dataset and source

- Dataset: NYC Wi-Fi Hotspot Locations  
- Source: NYC Open Data  
- File: `NYC_Wi-Fi_Hotspot_Locations_20251108.csv`  
- Link: <https://data.cityofnewyork.us/City-Government/NYC-Wi-Fi-Hotspot-Locations/yjub-udmw/about_data> 


## Part A: Numeric Summary 

Why Latitude? The assignment needs one numeric column. Latitude is clean and numeric in this file, so it is a good way to demonstrate the three statistics. The values also serve as a quick sanity check that the points sit where NYC should be on the map.

**What to look for**
- Mean and median latitudes near typical NYC ranges.  
- A mode value can appear because multiple records share the same location.


In [4]:
import pandas as pd

df = pd.read_csv("NYC_Wi-Fi_Hotspot_Locations_20251108.csv")

# picking one numeric column: "Latitude"
s = pd.to_numeric(df["Latitude"], errors="coerce").dropna()

# building a small table for the result
stats_df = pd.DataFrame({
    "Statistic": ["Mean", "Median", "Mode"],
    "Latitude (deg)": [s.mean(), s.median(), s.mode().iloc[0]]
}).round(5)

stats_df




Unnamed: 0,Statistic,Latitude (deg)
0,Mean,40.74179
1,Median,40.74591
2,Mode,40.68719


## Part B: The Hard Way

Here I compute the same three statistics with only the Python standard library. No pandas and no statistics module. This shows the logic would work even if I had to read and parse the CSV by hand.


In [5]:
# HARD WAY (standard library only)

import csv

path = "NYC_Wi-Fi_Hotspot_Locations_20251108.csv"
col_name = "Latitude"   # numeric column to analyze

# Reading the CSV and collecting numeric values from the chosen column
values = []
with open(path, newline="", encoding="utf-8") as f:
    reader = csv.reader(f)
    header = next(reader)
    idx = header.index(col_name)
    for row in reader:
        try:
            values.append(float(row[idx]))
        except:
            # skip blanks or non-numeric cells
            pass

# Mean (Hardway)
n = len(values)
mean_val = sum(values) / n

# Median (Hardway)
vals_sorted = sorted(values)
mid = n // 2
if n % 2 == 1:
    median_val = vals_sorted[mid]
else:
    median_val = (vals_sorted[mid - 1] + vals_sorted[mid]) / 2

# Mode (Hardway)
counts = {}
for v in vals_sorted:
    counts[v] = counts.get(v, 0) + 1
mode_val = max(counts, key=counts.get)

# 5) Print results
print("Latitude — hard way (no pandas)")
print(f"Mean   : {mean_val:.5f}")
print(f"Median : {median_val:.5f}")
print(f"Mode   : {mode_val:.5f}")


Latitude — hard way (no pandas)
Mean   : 40.74179
Median : 40.74591
Mode   : 40.68719


## Part C: Where Hotspots Are Concentrated

Counting hotspots by borough gives a quick first view of access points across the city. I am not claiming causality; this is a fast visual sense of scale that a reader can understand without a plotting library.

**How to read the chart**
- Each `+` represents a fixed number of hotspots.
- The longest line has the most hotspots.
- The goal is a simple ranking you can read in seconds.

**Legend:** 1 `+` = 50 hotspots. Please note that the plus signs are scaled by dividing each borough count by 50 and rounding.


In [7]:
# Visualization 

code_to_name = {1: "Manhattan", 2: "Bronx", 3: "Brooklyn", 4: "Queens", 5: "Staten Island"}

by_boro = (
    df.assign(BoroughCode=pd.to_numeric(df["Borough"], errors="coerce"))
      .assign(Borough=lambda x: x["BoroughCode"].map(code_to_name).fillna(x["Borough"].astype(str)))
      .groupby("Borough").size().reset_index(name="Hotspots")
      .sort_values("Hotspots", ascending=False)
)

unit = 50  # 1 '+' = 50 hotspots

print("NYC Wi-Fi hotspots by borough")
print(f"(Legend: 1 '+' = {unit} hotspots)\n")

for name, v in zip(by_boro["Borough"], by_boro["Hotspots"]):
    bar_len = max(1, int(round(v / unit)))
    print(f"{name:15} {'+' * bar_len}")



NYC Wi-Fi hotspots by borough
(Legend: 1 '+' = 50 hotspots)

Manhattan       +++++++++++++++++++++++++++++++++
Brooklyn        ++++++++++++++
Queens          +++++++++++
Bronx           ++++++
Staten Island   ++


### What stands out

Some interesting observations to note from this project are:

- Some boroughs have many more public Wi-Fi sites than others. The `+` chart shows that difference at a glance.
- This view is **raw counts**, not **per capita**. *Per capita* means “per person.” For example, **hotspots per 100,000 residents** = `(borough hotspot count / borough population) * 100,000`. Per capita data would have let me compare fairly across big and small boroughs if the scope of the project had been wider.
- Counts do not tell us **quality**. A hotspot might be slow, indoors, or only available during certain hours. Quantity is not the same as usable access.

**Note**
I consulted AI (ChatGPT) during drafting and coding for quick checks and suggestions especially for the 'hardway' section. 
