<a href="https://colab.research.google.com/github/AMSUCF/DHProgramming/blob/main/Bluesky_Solution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Social Media Analysis with AI Assistance
*Building on Your Combinatorial Text Experience*

## Getting Started Reminders

### Before You Begin:
1. **Set up Bluesky credentials** in Colab Secrets (left sidebar → 🔑)
   - Add `BLUESKY_USERNAME` (your.handle.bsky.social)
   - Add `BLUESKY_APP_PASSWORD` (generate in Bluesky Settings → App Passwords)

2. **Review AI assistance levels** from the workshop:
   - **Level 1:** Code comprehension & debugging
   - **Level 2:** Conceptual application & adaptation  
   - **Level 3:** Critical evaluation & extension

### Jupyter Workflow Tips:
- **Test in new cells** before modifying working code
- **Comment out** previous versions instead of deleting
- **Use markdown cells** to document your AI conversations
- **Save successful iterations** before experimenting further

### Recommended Cell Organization:
1. **Setup Cell:** Libraries and authentication (run once)
2. **Data Collection Cell:** API calls (modify and re-run as needed)
3. **Processing Cell:** Clean and structure your data
4. **Analysis Cells:** Individual analyses (iterate with AI)
5. **Visualization Cell:** Final outputs and interpretations

---

## Step 1: Setup and Authentication
*Add your code cell below to install libraries and authenticate with Bluesky*

**AI Prompt Starters:**
- "Help me install the required libraries for Bluesky API and data analysis"
- "I'm getting an authentication error. What might be wrong?"
- "Show me how to securely store and access API credentials in Colab"

In [1]:
# prompt: Install just the required libraries to access the Bluesky API

!pip install atproto
!pip install nltk

Collecting atproto
  Downloading atproto-0.0.61-py3-none-any.whl.metadata (14 kB)
Collecting dnspython<3,>=2.4.0 (from atproto)
  Downloading dnspython-2.7.0-py3-none-any.whl.metadata (5.8 kB)
Collecting libipld<4,>=3.0.1 (from atproto)
  Downloading libipld-3.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.6 kB)
Collecting websockets<14,>=12 (from atproto)
  Downloading websockets-13.1-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.8 kB)
Downloading atproto-0.0.61-py3-none-any.whl (380 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m380.4/380.4 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dnspython-2.7.0-py3-none-any.whl (313 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m313.6/313.6 kB[0m [31m14.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading libipld-3.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (682 kB)
[2K   [90m━

In [3]:
# prompt: Authenticate me to the Bluesky API using the secret names above

from google.colab import userdata
from atproto import Client

# Bluesky authentication
bluesky_client = Client()
try:
  bluesky_client.login(userdata.get('BLUESKY_USERNAME'), userdata.get('BLUESKY_APP_PASSWORD'))
  print("Bluesky authentication successful!")
except Exception as e:
  print(f"Bluesky authentication failed: {e}")
  print("Please check your BLUESKY_USERNAME and BLUESKY_APP_PASSWORD in Colab Secrets.")


Bluesky authentication successful!


In [8]:
# search for up to 10 matching posts
resp = bluesky_client.app.bsky.feed.search_posts({
    "q": "#ai",
    "limit": 10
})

for post in resp.posts:
    author = post.author.handle
    text   = post.record.text
    print(f"@{author}: {text}\n")

@mrblackk.bsky.social: Art from my Brain 

Dragon Ball 

#art #artist #love #drawing #aiart #artwork #dragontuesday #ai #like #illustration #digitalart #aiartcommunity #gothic #gothaesthetic #picoftheday #spooky #midjourney #aiart #midjourneyart #skullart #midjourneyai #darkart #aiartist #epic #artoftheday #aiartists

@bluesky.awakari.com: KOLO Launches Next-Generation Digital Wallet with Worldwide Debit Card, Bridging Digital Assets and Everyday Spending Astana City, Kazakhstan, [May 28th, 2025] — KOLO, a leading web3 project, has officially launched its innovative digital wallet with ...

| Details | Interest | Feed |

@mrblackk.bsky.social: Self Portrait 

#art #artist #love #drawing #aiart #artwork #photooftheday #painting #ai #like #illustration #digitalart #aiartcommunity #gothic #gothaesthetic #picoftheday #spooky #midjourney #aiart #midjourneyart #skullart #midjourneyai #darkart #aiartist #epic #artoftheday #aiartists

@mrblackk.bsky.social: Art from my Brain 

Caesar 

#Ai #ro

In [15]:
# assuming you have `all_posts` already populated...
import pandas as pd

records = []
for post in all_posts:
    author = post.author
    rec    = post.record

    # base fields
    row = {
        "author_handle":         author.handle,
        "author_display_name":   author.display_name,
        "author_did":            author.did,
        "author_avatar_url":     author.avatar,
        "author_created_at":     author.created_at,    # snake_case
        "post_text":             rec.text.replace("\n", " "),
        "post_created_at":       rec.created_at,       # snake_case
        "post_uri":              post.uri,
    }

    # any embedded URLs in facets (e.g. links)
    linked_urls = []
    if rec.facets:
        for facet in rec.facets:
            for feat in facet.features:
                # feature types can vary; Link has a `.uri`
                if hasattr(feat, "uri"):
                    linked_urls.append(feat.uri)
    row["linked_urls"] = ",".join(linked_urls) if linked_urls else None

    # if you want the post’s embed object (e.g. an image/video)
    if rec.embed:
        # different embed types have different attributes; here’s a generic catch:
        row["embed"] = rec.embed.dict()
    else:
        row["embed"] = None

    records.append(row)

# build the DataFrame
df = pd.DataFrame(records)

# inspect columns
print(df.columns.tolist())

# and write out as before
df.to_csv("/content/ai_posts_flat.csv", index=False)
print(f"Wrote {len(df)} rows to /content/ai_posts_flat.csv")


['author_handle', 'author_display_name', 'author_did', 'author_avatar_url', 'author_created_at', 'post_text', 'post_created_at', 'post_uri', 'linked_urls', 'embed']
Wrote 1000 rows to /content/ai_posts_flat.csv


<ipython-input-15-7023f00ac483>:34: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
  row["embed"] = rec.embed.dict()


## Step 2: Data Collection
*Create cells below to collect your research corpus from Bluesky*

**Consider:**
- What users or hashtags relate to your research interest?
- How many posts do you need for meaningful analysis?
- What time period should your data cover?

**AI Prompt Starters:**
- "Help me write a function to collect posts from specific users"
- "How do I search for posts containing certain hashtags?"
- "My data collection is only getting a few posts. How can I get more?"

## Step 3: Data Processing
*Transform raw API data into analysis-ready format*

**Key Tasks:**
- Convert API responses to pandas DataFrame
- Extract relevant features (timestamps, engagement, text length, etc.)
- Clean and validate your data

**AI Prompt Starters:**
- "Convert this Bluesky API response into a pandas DataFrame"
- "Help me extract and clean timestamps from social media data"
- "I have missing values in my dataset. How should I handle them?"

## Step 4: Content Analysis
*Analyze patterns in your collected text data*

**Analysis Ideas:**
- Categorize posts by topic or theme
- Analyze word frequency and key terms
- Compare content types and their engagement

**AI Prompt Starters:**
- "Create a function to categorize posts based on academic, literary, or general content"
- "How do I analyze word frequency in my social media corpus?"
- "My text categorization isn't working well. Help me debug and improve it"

## Step 5: Temporal Analysis
*Examine patterns over time in your data*

**Questions to Explore:**
- When are users most active?
- How does engagement vary by time of day or day of week?
- Are there notable spikes or patterns in posting activity?

**AI Prompt Starters:**
- "Analyze posting patterns by hour and day of week in my dataset"
- "How do I identify unusual activity periods in my temporal data?"
- "Create visualizations showing posting activity over time"

## Step 6: Visualization
*Create compelling visualizations of your findings*

**Visualization Goals:**
- Make patterns visible and interpretable
- Support your analytical arguments
- Communicate findings to your intended audience

**AI Prompt Starters:**
- "Create a comprehensive dashboard showing key patterns in my social media data"
- "This scatter plot is too crowded. How can I make it clearer?"
- "What additional visualizations would reveal patterns I might be missing?"

## Step 7: Interpretation and Analysis
*Connect computational findings to your research questions*

**Critical Questions:**
- What do these patterns reveal about the community or phenomenon you're studying?
- How do computational findings compare to traditional research methods?
- What are the limitations of your approach and data?

**AI Prompt Starters:**
- "Help me interpret these engagement patterns in the context of [your discipline]"
- "What are the potential biases in my social media dataset?"
- "How can I validate these computational results against other sources?"

## Advanced Extensions (Optional)
*For deeper analysis if you have time and interest*

**Possible Extensions:**
- Network analysis of user interactions
- Topic modeling to identify themes
- Sentiment analysis of posts
- Comparison with other datasets or time periods

**AI Prompt Starters:**
- "Help me implement basic network analysis for user mentions in my data"
- "Create a topic modeling analysis to identify themes in my corpus"
- "How do I add sentiment analysis to my existing content analysis?"

---
## Notes and Reflections
*Use this space to document your process, interesting findings, and AI interactions*

### What worked well:
-

### Challenges encountered:
-

### Most helpful AI interactions:
-

### Key insights from your analysis:
-

### Questions for further research:
-