Project 1 - 2018 Central Park Squirrel Census

Robin Zhao

Data Source: https://data.cityofnewyork.us/Environment/2018-Central-Park-Squirrel-Census-Squirrel-Data/vfnx-vebw/about_data

Data Size: 31(columns) + 3023(rows)

Part 1: Pandas Version of Mean/Median/Mode of Hectare Squirrel Number

In [1]:
#Setting Up
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
#Loading Data
data = pd.read_csv("2018_Central_Park_Squirrel_Census__Squirrel_Data_20251106.csv")

#1 Mean of Hectare Squirrel Number:

In [3]:
mean_squirrelNumber = data['Hectare Squirrel Number'].mean()
print("Mean of Hectare Squirrel Number:", mean_squirrelNumber)

Mean of Hectare Squirrel Number: 4.123718160767449


#2 Median of Hectare Squirrel Number:

In [4]:
median_squirrelNumber = data['Hectare Squirrel Number'].median()
print("Median of Hectare Squirrel Number:", median_squirrelNumber)

Median of Hectare Squirrel Number: 3.0


#3 Mode of Hectare Squirrel Number:

In [5]:
mode_squirrelNumber = data['Hectare Squirrel Number'].mode()[0]# Taking the first mode in case of multiple modes
print("Mode of Hectare Squirrel Number:", mode_squirrelNumber)

Mode of Hectare Squirrel Number: 1


Part 2: Python Version of Mean/Median/Mode of Hectare Squirrel Number

#1 Mean of Hectare Squirrel Number:

In [6]:
mean_python = sum(data['Hectare Squirrel Number']) / len(data['Hectare Squirrel Number'])
print("Mean calculated using pure Python:", mean_python)

Mean calculated using pure Python: 4.123718160767449


#2 Median of Hectare Squirrel Number:

In [7]:
sorted_numbers = sorted(data['Hectare Squirrel Number'])
n = len(sorted_numbers)
if n % 2 == 1:
    median_python = sorted_numbers[n // 2]
else:
    median_python = (sorted_numbers[n // 2 - 1] + sorted_numbers[n // 2]) / 2
print("Median calculated using pure Python:", median_python)


Median calculated using pure Python: 3


#3 Mode of Hectare Squirrel Number:

In [8]:
counts = {}
for number in data['Hectare Squirrel Number']:
    if number in counts:
        counts[number] += 1
    else:
        counts[number] = 1
max_count = max(counts.values()) # Find the highest frequency
modes_python = [number for number, count in counts.items() if count == max_count]
mode_python = modes_python[0]  # Taking the first mode in case of multiple modes
print("Mode calculated using pure Python:", mode_python)

Mode calculated using pure Python: 1


Little reflection: I got the same results from both pandas and pure Python, but the pure Python method was much more complicated to write and less efficient.

Part 3: Data Visualization

In [9]:
import math
counts = {}
for number in data['Hectare Squirrel Number']:
    n = int(n)
    if number in counts:
        counts[number] += 1
    else:
        counts[number] = 1 
for n in sorted(counts):
    bar = "#" * (counts[n] // 10)   # Scale down for better visualization
    print(f"{n:2d}: {bar} ({counts[n]})")


print("\nSummary:")
print(f"Mean   = {data['Hectare Squirrel Number'].mean():.2f}")
print(f"Median = {data['Hectare Squirrel Number'].median()}")
print(f"Mode   = {data['Hectare Squirrel Number'].mode()[0]}")


 1: ############################################################# (614)
 2: ##################################################### (533)
 3: ############################################ (441)
 4: #################################### (364)
 5: ############################ (287)
 6: ###################### (223)
 7: ################ (161)
 8: ########### (119)
 9: ######## (85)
10: ##### (54)
11: #### (42)
12: ### (33)
13: ## (23)
14: # (16)
15: # (10)
16:  (8)
17:  (4)
18:  (1)
19:  (1)
20:  (1)
21:  (1)
22:  (1)
23:  (1)

Summary:
Mean   = 4.12
Median = 3.0
Mode   = 1


Conclusion:

In this project, I analyzed the Hectare Squirrel Number column from the 2018 Central Park Squirrel Census dataset.

First, using pandas, I computed the mean, median, and mode efficiently to understand the general distribution of squirrel counts per hectare. Then, I recreated the same calculations using pure Python. Finally, I built a simple visualization using only the Python standard library. This visualization displayed the frequency distribution of squirrel numbers as a histogram made of symbols, providing an intuitive overview of the dataset while meeting the projectâ€™s restrictions on visualization tools.

Overall, the results showed that most hectare areas contained between 1 and 5 squirrels, with a few higher-count outliers. This indicates that squirrel populations are relatively evenly distributed across Central Park, with occasional clusters in specific regions.

Reflection:

Through this project, I learned how to approach data analysis from both a pandas and pure Python perspective.

Using pandas was fast and convenient, while the pure Python version helped me better understand how statistical calculations actually work behind the scenes.
I also realized that creating visualizations without external libraries requires careful thinking about how to represent data clearly using only text and symbols.

Although my pandas and pure Python results were consistent, the manual method was much more challenging.