# Section 1: Title - Data In Hand
### Name: Aubrey Nighman  
### Professor: Sandeep More 
### Class: Data in EMAT
### Date: November 10, 2025

**Data Question:**
Do content and postings related to or about food get more engagement at night compared to in the daytime? 


In my original proposal, I planned to use the Bluesky API, but I switched to Reddit's public JSON API because the Bluesky search and timeline endpoints were blocked. The research question is the same, and Reddit provides food-related posts with timestamps and engagement needed for my project.


In [6]:
# Section 2: Imports

import requests
import pandas as pd

In [7]:
# Section 3: Data Collection

url = "https://www.reddit.com/r/food/hot.json"
headers = {"User-agent": "DataInEMATProject/0.1"}

response = requests.get(
    url,
    headers=headers,
    params={"limit": 100},
    timeout=10
)

data = response.json()
posts = data["data"]["children"]

all_posts = []

for item in posts:
    d = item["data"]
    all_posts.append({
        "text": d.get("title", ""),
        "createdAt": d.get("created_utc", None),  # UTC seconds
        "upvotes": d.get("ups", 0),
        "comments": d.get("num_comments", 0),
        "author": d.get("author", "")
    })

len(all_posts)

100

In [8]:
# Section 4: DataFrame

df = pd.DataFrame(all_posts)

df["createdAt"] = pd.to_datetime(df["createdAt"], unit="s")

df.head()


Unnamed: 0,text,createdAt,upvotes,comments,author
0,About future Bon Appétit/Condé Nast AMAs on r/...,2025-11-10 23:15:33,0,5,Sun_Beams
1,"[Homemade] Smoked glazed spiral ham, baked mac...",2025-11-14 00:15:05,271,17,softrotten
2,[Homemade] Poulet Mafé,2025-11-13 17:15:44,716,25,UhhhImTrashSorry
3,[Homemade] Chicken parmesan with alfredo pappa...,2025-11-13 17:56:23,445,20,ItsDreamyWeather
4,[Homemade] Chicken Parm,2025-11-13 18:08:37,395,8,stillstillers


In [9]:
# Section 5: Summary

print("Number of posts collected:", len(df))
df.info()


Number of posts collected: 100
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   text       100 non-null    object        
 1   createdAt  100 non-null    datetime64[ns]
 2   upvotes    100 non-null    int64         
 3   comments   100 non-null    int64         
 4   author     100 non-null    object        
dtypes: datetime64[ns](1), int64(2), object(2)
memory usage: 4.0+ KB


# Section 6: Data Structure
This dataset has posts related to food that were collected from the r/food subreddit using Reddit's public JSON API. The columns in the DataFrame are: 

text: the title of the Reddit post
createdAt: the timestamp of when the post was created (converted from UNIX seconds to a datetime format)
upvotes: the number of upvotes the post received
comments: the number of comments on the post
author: the username of the person who made the post

This has the information needed to look at how food content is posted and engaged with at different times of the day.

In [10]:
# Section 6: Summary of Data

print("Number of posts retrieved from Reddit:", len(df))
df.info()



Number of posts retrieved from Reddit: 100
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   text       100 non-null    object        
 1   createdAt  100 non-null    datetime64[ns]
 2   upvotes    100 non-null    int64         
 3   comments   100 non-null    int64         
 4   author     100 non-null    object        
dtypes: datetime64[ns](1), int64(2), object(2)
memory usage: 4.0+ KB


# Section 7: Data Collection

For this checkpoint of the final project, I collected food-related posts from the r/food subreddit using Reddit's publicly available “hot” feed.

I requested up to 100 posts and recorded each post’s title, time that it was created, number of upvotes, number of comments, and the author. These give me the engagement and timing information that is needed to answer my research question. 