1. Filter the data to include only weekdays (Monday to Friday) and
plot a line graph showing the pedestrian counts for each day of the
week.
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
# Read the dataset
import pandas as pd
import matplotlib.pyplot as plt
url = "https://data.cityofnewyork.us/api/views/6fi9-q3ta/rows.csv?accessType=DOWNLOAD"
df = pd.read_csv(url)

2. Track pedestrian counts on the Brooklyn Bridge for the year 2019
and analyze how different weather conditions influence pedestrian
activity in that year. Sort the pedestrian count data by weather
summary to identify any correlations( with a correlation matrix)
between weather patterns and pedestrian counts for the selected year.

-This question requires you to show the relationship between a
numerical feature(Pedestrians) and a non-numerical feature(Weather
Summary). In such instances we use Encoding. Each weather condition
can be encoded as numbers( 0,1,2..). This technique is called One-hot
encoding.

-Correlation matrices may not always be the most suitable
visualization method for relationships involving categorical
datapoints, nonetheless this was given as a question to help you
understand the concept better.

3. Implement a custom function to categorize time of day into morning,
afternoon, evening, and night, and create a new column in the
DataFrame to store these categories. Use this new column to analyze
pedestrian activity patterns throughout the day.

-Students can also show plots analyzing activity.

In [None]:
# ----- Task 1: Filter Weekdays and Plot Pedestrian Counts -----
# Convert 'Date' column to datetime
df['Date'] = pd.to_datetime(df['Date'])

# Extract day of the week (0=Monday, 6=Sunday)
df['DayOfWeek'] = df['Date'].dt.dayofweek

# Filter for weekdays (Monday=0 to Friday=4)
weekdays_df = df[df['DayOfWeek'] <= 4]

# Map day numbers to names
day_mapping = {0: 'Monday', 1: 'Tuesday', 2: 'Wednesday', 3: 'Thursday', 4: 'Friday'}
weekdays_df['DayName'] = weekdays_df['DayOfWeek'].map(day_mapping)

# Aggregate pedestrian counts by day of the week
pedestrian_counts = weekdays_df.groupby('DayName')['Pedestrians'].sum().reindex(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'])

# Plotting the line graph
plt.figure(figsize=(10,6))
plt.plot(pedestrian_counts.index, pedestrian_counts.values, marker='o', linestyle='-')
plt.title('Total Pedestrian Counts by Weekday')
plt.xlabel('Day of the Week')
plt.ylabel('Number of Pedestrians')
plt.grid(True)
plt.show()

# ----- Task 2: Analyze Pedestrian Counts on Brooklyn Bridge in 2019 -----

# Filter for Brooklyn Bridge
# Assuming there's a 'Location' column; adjust the column name as per your dataset
brooklyn_bridge_df = df[df['Location'].str.contains('Brooklyn Bridge', case=False, na=False)]

# Filter for the year 2019
brooklyn_bridge_df = brooklyn_bridge_df[brooklyn_bridge_df['Date'].dt.year == 2019]

# Check for missing values in 'Pedestrians' and 'Weather Summary'
brooklyn_bridge_df = brooklyn_bridge_df.dropna(subset=['Pedestrians', 'Weather Summary'])

# One-hot encode the 'Weather Summary' column
weather_dummies = pd.get_dummies(brooklyn_bridge_df['Weather Summary'], prefix='Weather')

# Combine the dummy variables with the main dataframe
brooklyn_bridge_encoded = pd.concat([brooklyn_bridge_df, weather_dummies], axis=1)

# Select numerical columns for correlation
numerical_cols = ['Pedestrians'] + list(weather_dummies.columns)
corr_matrix = brooklyn_bridge_encoded[numerical_cols].corr()

# Display the correlation matrix
print("Correlation Matrix:")
print(corr_matrix)

# Visualize the correlation matrix
import seaborn as sns

plt.figure(figsize=(12,8))
sns.heatmap(corr_matrix, annot=True, fmt=".2f", cmap='coolwarm')
plt.title('Correlation Matrix: Pedestrians vs Weather Conditions (Brooklyn Bridge, 2019)')
plt.show()

# ----- Task 3: Categorize Time of Day and Analyze Pedestrian Activity Patterns -----

# Ensure 'Time' column is in datetime format; assuming it's a string like 'HH:MM:SS'
brooklyn_bridge_encoded['Time'] = pd.to_datetime(brooklyn_bridge_encoded['Time'], format='%H:%M:%S').dt.time

# Define the categorization function
def categorize_time(time_obj):
    if time_obj >= pd.to_datetime('05:00:00').time() and time_obj < pd.to_datetime('12:00:00').time():
        return 'Morning'
    elif time_obj >= pd.to_datetime('12:00:00').time() and time_obj < pd.to_datetime('17:00:00').time():
        return 'Afternoon'
    elif time_obj >= pd.to_datetime('17:00:00').time() and time_obj < pd.to_datetime('21:00:00').time():
        return 'Evening'
    else:
        return 'Night'

# Apply the function to create a new column
brooklyn_bridge_encoded['TimeOfDay'] = brooklyn_bridge_encoded['Time'].apply(categorize_time)

# Analyze pedestrian counts across different times of the day
time_of_day_counts = brooklyn_bridge_encoded.groupby('TimeOfDay')['Pedestrians'].sum().reindex(['Morning', 'Afternoon', 'Evening', 'Night'])

print("\nPedestrian Counts by Time of Day:")
print(time_of_day_counts)

# Plotting the pedestrian activity patterns
plt.figure(figsize=(10,6))
time_of_day_counts.plot(kind='bar', color=['skyblue', 'orange', 'green', 'purple'])
plt.title('Total Pedestrian Counts by Time of Day (Brooklyn Bridge, 2019)')
plt.xlabel('Time of Day')
plt.ylabel('Number of Pedestrians')
plt.xticks(rotation=45)
plt.show()