<a href="https://colab.research.google.com/github/e3la/instagram2bepress/blob/main/InstagramtoDC.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Welcome to this Instagram to DC tool!

This project is was created using an avante guard technique of [vibecoding](https://en.wikipedia.org/wiki/Vibe_coding). I know enough python to be dangerous. If anything about this feels wrong, if the vibe is off, be like a tree and leaf. I try to share what the code is doing (as provided to me either by chatgpt, claude or gemini as I've been working on the project) and the code itself (likewise provided by ai). My end goal is batch upload an archive of instagram posts to digital commons.

"I"? We? AI and I? wrote this in google colab, which is a instance of jupyter notebook. It's a tool for sharing and working with python and text to explain what is happening. There are textbooks and code boxes that are helping us on our journey. Buckle in! The first thing you'll need is a zip export from instagram. Whoever has the keys to the instagram account can visit https://www.instagram.com/download/request and get a zip. That's the file you'll need to start.

This looks complicated, but take a deep breath, and know that the process is upload a zip and press 'play' multiple times until you can download a zip you can bring to digital commons for batch upload. If you need help with that bit ... get in touch with your support person there!

**What this code does:**

The code block below handles uploading your Instagram archive `.zip` file into this Colab notebook environment.

1.  `from google.colab import files`: Imports the necessary tools for file handling. - Okay, get ready, we're going to need the tools for moving files around!
2.  `uploaded = files.upload()`: **Displays an "Upload" button** below this cell.
    *   **==> Action Required <==**: Click the button and select your Instagram `.zip` file from your computer.
3.  The rest of the code finds the name of the file you uploaded, stores it in the `zip_filename` variable, and prints the filename as confirmation.

**Purpose:** To get your Instagram data file ready for the next steps in the notebook.

In [None]:
from google.colab import files

# Upload the zip file
uploaded = files.upload()

# Get the uploaded file name
for filename in uploaded.keys():
    zip_filename = filename
    print(f"Uploaded file: {zip_filename}")

TypeError: 'NoneType' object is not subscriptable

**What the next code does:**

It takes the `.zip` file you uploaded and **unzips** it, revealing all the files and folders inside your Instagram archive. It then **prints a list** of everything it found.

1.  `extract_dir = 'extracted_instagram_data'`: Sets up a dedicated folder name where the unzipped files will be placed. This keeps things organized.
2.  `with zipfile.ZipFile(...)`: Opens your uploaded `.zip` file.
3.  `zip_ref.extractall(extract_dir)`: **Extracts all contents** from the zip file into the `extracted_instagram_data` folder.
4.  `for root, dirs, files in os.walk(extract_dir)`: This is the part that explores the newly created folder and all its subfolders.
5.  The `print()` statements states who many files were in the zip.

**Purpose:** To unpack your Instagram archive and give you an overview of its structure and all the individual data files (like photos, messages, profile info, etc.) it contains.

**Note:** Your Instagram archive might contain many files, so this list could be quite long! You can also visually explore the `extracted_instagram_data` folder using the file browser icon (looks like a folder 📁) on the left sidebar of Colab.

In [None]:
import os
import zipfile

# Define a directory name to extract the contents into
extract_dir = 'extracted_instagram_data'

print(f"Unzipping '{zip_filename}' into a folder named '{extract_dir}'...")

# Use a try-except block to handle potential errors during unzipping
try:
    # Open the zip file in read mode ('r')
    with zipfile.ZipFile(zip_filename, 'r') as zip_ref:
        # Extract all the contents into the specified directory
        zip_ref.extractall(extract_dir)
    print(f"Successfully unzipped the archive.")

    file_count = 0
    image_count = 0
    video_count = 0

    # Walk through the directory tree (folders, subfolders, files)
    for root, dirs, files in os.walk(extract_dir):
        # Get the path relative to the extraction directory for cleaner display
        #relative_path = os.path.relpath(root, extract_dir)
        relative_path = root.replace(extract_dir + os.sep, '')

        # Count all files within the current folder
        for file in files:
            file_count += 1
            if file.lower().endswith(('.png', '.jpg', '.jpeg', '.gif', '.bmp', '.webp')):
              image_count += 1
            elif file.lower().endswith('.mp4'):
              video_count += 1

    print("\n-----------------------------------------")
    print(f"Found {file_count} files in total.")
    print(f"Images: {image_count}")
    print(f"MP4 videos: {video_count}")
    print("You can also browse these files using the 'Files' panel on the left sidebar.")
    print("\n-----------------------------------------")

except zipfile.BadZipFile:
    print(f"Error: The file '{zip_filename}' does not seem to be a valid zip archive. Please check the file and try uploading again.")
except FileNotFoundError:
    print(f"Error: Could not find the file '{zip_filename}'. Was the upload in the previous step successful?")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

Unzipping 'instagram-umsllibraries-2025-03-07-Ardjbhx1.zip' into a folder named 'extracted_instagram_data'...
Successfully unzipped the archive.

-----------------------------------------
Found 1445 files in total.
Images: 1229
MP4 videos: 103
You can also browse these files using the 'Files' panel on the left sidebar.

-----------------------------------------


**What this next code block does:**

This block focuses on **loading the data** from the three most relevant JSON files for understanding media content: `posts_1.json`, `reels.json`, and `stories.json`. These files contain metadata like captions, timestamps, and file paths for your posts, reels, and stories.

1.  `import json`: Imports tools for reading JSON data.
2.  **Find Activity Folder:** It first tries to automatically find the main folder within your unzipped archive (usually named after your username, like `syracuseuniversitylibraries_20231027`) by looking for a common file (`account_information.json`). If it can't find it, it defaults to `your_instagram_activity` (you might need to edit the code if this default is wrong for your archive).
3.  **Define Paths:** It constructs the exact path to where these JSON files *should* be located (inside the `media` subfolder within the main activity folder).
4.  **Load Files Loop:** It then attempts to:
    *   **Check:** Verify if each file (`posts_1.json`, `reels.json`, `stories.json`) actually exists at the expected location.
    *   **Open & Read:** If a file exists, it opens it using the correct `utf-8` encoding (important for special characters/emojis).
    *   **Parse JSON:** It uses the `json.load()` function to convert the raw text from the file into a structured Python format (usually a list of dictionaries).
    *   **Store Data:** The loaded data for each type (posts, reels, stories) is stored in a central dictionary called `instagram_media_data`. You can access the posts data later using `instagram_media_data['posts']`, reels using `instagram_media_data['reels']`, etc.
5.  **Handle Missing Files/Errors:** If a file is missing (e.g., you have no Reels, so `reels.json` doesn't exist) or if there's an error reading a file (e.g., it's corrupted), it prints a message and stores `None` for that data type, preventing the script from crashing.
6.  **Summary:** Finally, it prints a summary showing which files were successfully loaded and how many entries were found in each.

**Purpose:** To extract the key metadata about posts, reels, and stories from their respective JSON files and load it into Python variables (`instagram_media_data['posts']`, `instagram_media_data['reels']`, `instagram_media_data['stories']`) so we can analyze or process it in later steps for potential repository migration.

In [None]:
import json

# --- Configuration ---
potential_base_dir = extract_dir  # Assumes this is defined earlier
activity_folder_name = None

# Detect main Instagram activity folder
for item in os.listdir(potential_base_dir):
    item_path = os.path.join(potential_base_dir, item)
    if os.path.isdir(item_path) and 'account_information.json' in os.listdir(item_path):
        activity_folder_name = item
        print(f"Automatically detected activity folder: '{activity_folder_name}'")
        break

# Fallback default
if not activity_folder_name:
    activity_folder_name = 'your_instagram_activity'
    print(f"Using default: '{activity_folder_name}'.")

# Path to media folder
media_path = os.path.join(potential_base_dir, activity_folder_name, 'media')
print(f"\nLooking for JSON files inside: '{media_path}'")

# Files we want to parse
files_to_load = {
    "posts": os.path.join(media_path, 'posts_1.json'),
    "reels": os.path.join(media_path, 'reels.json'),
    "stories": os.path.join(media_path, 'stories.json')
}

instagram_media_data = {}

# --- Load and normalize JSON ---
print("\nAttempting to load media:")

for media_type, file_path in files_to_load.items():
    print(f" -> {media_type.title()}: '{os.path.basename(file_path)}'")

    try:
        if not os.path.exists(file_path):
            print(f"    ⚠️ File not found. Skipping.")
            instagram_media_data[media_type] = None
            continue

        with open(file_path, 'r', encoding='utf-8') as f:
            data = json.load(f)

        if isinstance(data, list):  # For posts_1.json
            instagram_media_data[media_type] = data
            print(f"    ✅ Loaded {len(data)} entries.")

        elif isinstance(data, dict):
            # Handle known top-level keys
            top_level_key = {
                "reels": "ig_reels_media",
                "stories": "ig_stories"
            }.get(media_type)

            if top_level_key in data:
                # Check if it's a flat list or list of groups
                first_entry = data[top_level_key][0] if data[top_level_key] else {}

                if isinstance(first_entry, dict) and 'media' in first_entry:
                    # Flatten nested media lists
                    flat_media = []
                    for group in data[top_level_key]:
                        if isinstance(group, dict) and 'media' in group:
                            flat_media.extend(group['media'])
                    instagram_media_data[media_type] = flat_media
                    print(f"    ✅ Loaded {len(flat_media)} flattened entries.")
                elif isinstance(data[top_level_key], list):
                    # Already flat list
                    instagram_media_data[media_type] = data[top_level_key]
                    print(f"    ✅ Loaded {len(data[top_level_key])} entries.")
                else:
                    print(f"    ⚠️ Unsupported structure for {media_type}.")
                    instagram_media_data[media_type] = None
            else:
                print(f"    ⚠️ Expected key '{top_level_key}' not found.")
                instagram_media_data[media_type] = None

        else:
            print(f"    ❌ Unsupported JSON structure in {media_type}.")
            instagram_media_data[media_type] = None

    except json.JSONDecodeError:
        print(f"    ❌ Could not parse JSON (corrupt?).")
        instagram_media_data[media_type] = None
    except Exception as e:
        print(f"    ❌ Error: {e}")
        instagram_media_data[media_type] = None

# --- Summary ---
print("\n--- Summary ---")
for media_type, data in instagram_media_data.items():
    if isinstance(data, list):
        print(f"- {media_type.capitalize()}: ✅ {len(data)} entries loaded")
    else:
        print(f"- {media_type.capitalize()}: ❌ Not loaded or no valid entries")


Using default: 'your_instagram_activity'.

Looking for JSON files inside: 'extracted_instagram_data/your_instagram_activity/media'

Attempting to load media:
 -> Posts: 'posts_1.json'
    ✅ Loaded 646 entries.
 -> Reels: 'reels.json'
    ✅ Loaded 25 flattened entries.
 -> Stories: 'stories.json'
    ✅ Loaded 485 entries.

--- Summary ---
- Posts: ✅ 646 entries loaded
- Reels: ✅ 25 entries loaded
- Stories: ✅ 485 entries loaded


This section is grabbing your instagram handle.

In [None]:
# Example filename string (replace with your actual variable if needed)
# zip_filename = "..." # Assuming this variable holds the filename
#zip_filename = "instagram-umsllibraries-2025-03-07-Ardjbhx1.zip"

# Remove the 'instagram-' prefix if it exists
if zip_filename.startswith("instagram-"):
    temp_name = zip_filename[len("instagram-"):] # Get everything after 'instagram-'
else:
    temp_name = zip_filename # Handle cases where it might not start with 'instagram-'

# Split the remaining string by the hyphen '-'
parts = temp_name.split('-', 1) # Split only on the *first* hyphen found

# The desired part is the first element after the split
if len(parts) > 0:
    instausername = parts[0]
    print(f"Username: {instausername}")
else:
    print("Could not extract using split method (unexpected format).")

Username: umsllibraries


In [None]:
# prompt: I want to export a zip of all the reels, posts and stories with a csv metadata file and readme

import csv
import os

# --- Create README ---
readme_content = """
This archive contains Instagram data exported from the Instagram app.
The archive includes:

- **Reels:** Video posts.
- **Posts:** Photos and videos.
- **Stories:** Photo and video stories.
- **metadata.csv:** A CSV file containing metadata for the exported media.
"""

# --- Create metadata.csv ---
metadata_file = "metadata.csv"
with open(metadata_file, "w", newline="", encoding="utf-8") as csvfile:
    fieldnames = ["media_type", "filename", "timestamp", "caption"]
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()

    for media_type, data in instagram_media_data.items():
        if data:
          for item in data:
              try:
                if media_type == "posts":
                    filename = item.get("image_versions2", {}).get("candidates", [{}])[0].get("url")
                    timestamp = item.get("taken_at")
                    caption = item.get("caption", {}).get("text")
                elif media_type == "reels":
                    filename = item.get("video_versions", [{}])[0].get("url")
                    timestamp = item.get("taken_at")
                    caption = item.get("caption", {}).get("text")
                elif media_type == "stories":
                    filename = item.get("image_versions2", {}).get("candidates", [{}])[0].get("url")
                    timestamp = item.get("taken_at")
                    caption = item.get("caption", {}).get("text")

                writer.writerow({
                    "media_type": media_type,
                    "filename": filename,
                    "timestamp": timestamp,
                    "caption": caption,
                })
              except Exception as e:
                print(f"Error processing {media_type} item: {e}")



# --- Create and populate the zip archive ---
zip_filename = f"{instausername}_instagram_archive.zip"
with zipfile.ZipFile(zip_filename, "w", zipfile.ZIP_DEFLATED) as zipf:

    # Add README
    zipf.writestr("README.txt", readme_content)

    # Add metadata
    zipf.write(metadata_file)


    # Add media files (reels, posts, stories)
    for media_type, data in instagram_media_data.items():
      if data:
        for item in data:
          try:
            if media_type == "posts":
              filename = item.get("image_versions2", {}).get("candidates", [{}])[0].get("url")
            elif media_type == "reels":
              filename = item.get("video_versions", [{}])[0].get("url")
            elif media_type == "stories":
              filename = item.get("image_versions2", {}).get("candidates", [{}])[0].get("url")
            # Assuming 'filename' is a direct path or URL, adjust as needed for your file structure
            if filename and os.path.exists(filename):
              zipf.write(filename, arcname=os.path.basename(filename)) # Add to zip preserving filename
          except Exception as e:
              print(f"Error adding file to zip: {e}")

# --- Download the zip archive ---
files.download(zip_filename)


AttributeError: 'list' object has no attribute 'download'

In [None]:
import csv
from datetime import datetime
from shutil import copy2

# --- Prompt for Instagram handle ---
instagram_handle = instausername;

# --- Configuration ---
media_type = 'reels'
output_dir = os.path.join(extract_dir, f'{media_type}_export')
os.makedirs(output_dir, exist_ok=True)

# CSV and ZIP file paths
csv_path = os.path.join(output_dir, f'{media_type}_metadata.csv')
readme_path = os.path.join(output_dir, 'README.txt')
zip_path = os.path.join(output_dir, f'{media_type}_package.zip')

# Get loaded Reels data
reels_data = instagram_media_data.get(media_type)
if not reels_data:
    print("No Reels data loaded.")
else:
    print(f"Preparing to export {len(reels_data)} Reels...")

    # 1. Write CSV
    with open(csv_path, mode='w', newline='', encoding='utf-8') as csvfile:
        fieldnames = ['filename', 'creation_date', 'original_uri', 'subtitles_uri']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()

        # Keep track of copied files for ZIP
        copied_files = []

        for i, item in enumerate(reels_data):
            uri = item.get('uri')
            if not uri:
                continue

            media_path = os.path.join(extract_dir, uri)
            if not os.path.exists(media_path):
                print(f"Warning: File not found - {media_path}")
                continue

            # Get timestamp
            timestamp = item.get('creation_timestamp')
            date_str = datetime.fromtimestamp(timestamp).strftime('%Y-%m-%d') if timestamp else 'unknown-date'

            # Prepare export filename using handle and date
            ext = os.path.splitext(uri)[-1]
            export_filename = f"instagram_{instagram_handle}_reel_{date_str}_{i+1}_{os.path.splitext(os.path.basename(uri))[0]}{ext}"
            export_path = os.path.join(output_dir, export_filename)
            copy2(media_path, export_path)
            copied_files.append(export_path)

            # Subtitles (if any)
            subtitles_uri = ''
            try:
                subtitles_uri = item['media_metadata']['video_metadata']['subtitles']['uri']
            except KeyError:
                pass

            writer.writerow({
                'filename': export_filename,
                'creation_date': date_str,
                'original_uri': uri,
                'subtitles_uri': subtitles_uri
            })

    print(f"Metadata written to CSV: {csv_path}")

    # 2. Write README file
    with open(readme_path, 'w', encoding='utf-8') as f:
        f.write(f"""Instagram Reels Export Package
===============================

Handle: @{instagram_handle}
Exported: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}

This package contains:
- All exported Reels in video format.
- A CSV file (reels_metadata.csv) with metadata for each reel.
- This README.txt file for context.

Fields in the CSV:
- filename: The name of the exported video file.
- creation_date: ISO-formatted creation date of the reel.
- original_uri: The original relative URI from the Instagram archive.
- subtitles_uri: If available, the URI of associated subtitles.

Generated by your local Instagram archive helper script.
""")

    # 3. Zip everything up
    with zipfile.ZipFile(zip_path, 'w') as zipf:
        zipf.write(csv_path, arcname=os.path.basename(csv_path))
        zipf.write(readme_path, arcname='README.txt')
        for file_path in copied_files:
            zipf.write(file_path, arcname=os.path.basename(file_path))

    print(f"ZIP archive created: {zip_path}")


Preparing to export 25 Reels...
Metadata written to CSV: extracted_instagram_data/reels_export/reels_metadata.csv
ZIP archive created: extracted_instagram_data/reels_export/reels_package.zip


In [None]:
import csv
from datetime import datetime
from shutil import copy2

# --- Prompt for Instagram handle ---
instagram_handle = instausername;
if not instagram_handle:
    raise ValueError("Instagram handle cannot be empty.")

# --- Configuration ---
media_type = 'reels'
output_dir = os.path.join(extract_dir, f'{media_type}_export')
os.makedirs(output_dir, exist_ok=True)

# File paths
csv_path = os.path.join(output_dir, f'{media_type}_metadata.csv')
readme_path = os.path.join(output_dir, 'README.txt')
zip_path = os.path.join(output_dir, f'{media_type}_package.zip')

# Get loaded Reels data
reels_data = instagram_media_data.get(media_type)
if not reels_data:
    print("No Reels data loaded.")
else:
    print(f"Preparing to export {len(reels_data)} Reels...")

    copied_files = []

    # 1. Write metadata CSV
    with open(csv_path, mode='w', newline='', encoding='utf-8') as csvfile:
        fieldnames = ['filename', 'creation_date', 'original_uri', 'subtitles_uri', 'subtitles_file']
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()

        for i, item in enumerate(reels_data):
            uri = item.get('uri')
            if not uri:
                continue

            media_path = os.path.join(extract_dir, uri)
            if not os.path.exists(media_path):
                print(f"Warning: File not found - {media_path}")
                continue

            # Format creation date
            timestamp = item.get('creation_timestamp')
            date_str = datetime.fromtimestamp(timestamp).strftime('%Y-%m-%d') if timestamp else 'unknown-date'

            # Prepare export filenames
            ext = os.path.splitext(uri)[-1]
            base_filename = f"instagram_{instagram_handle}_reel_{date_str}_{i+1}"
            video_filename = base_filename + ext
            video_path = os.path.join(output_dir, video_filename)
            copy2(media_path, video_path)
            copied_files.append(video_path)

            # Handle subtitles
            subtitles_uri = ''
            subtitles_path = ''
            try:
                subtitles_uri = item['media_metadata']['video_metadata']['subtitles']['uri']
                original_subs_path = os.path.join(extract_dir, subtitles_uri)
                if os.path.exists(original_subs_path):
                    subtitles_path = os.path.join(output_dir, base_filename + '.srt')
                    copy2(original_subs_path, subtitles_path)
                    copied_files.append(subtitles_path)
                else:
                    print(f"Subtitles URI exists but file not found: {original_subs_path}")
            except KeyError:
                pass

            writer.writerow({
                'filename': video_filename,
                'creation_date': date_str,
                'original_uri': uri,
                'subtitles_uri': subtitles_uri,
                'subtitles_file': os.path.basename(subtitles_path) if subtitles_path else ''
            })

    print(f"Metadata written to CSV: {csv_path}")

    # 2. Write README file
    with open(readme_path, 'w', encoding='utf-8') as f:
        f.write(f"""Instagram Reels Export Package
===============================

Handle: @{instagram_handle}
Exported: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}

This package contains:
- Exported Reels in video format.
- Subtitles (SRT files) if available.
- A CSV file (reels_metadata.csv) with metadata.
- This README.txt file.

CSV Fields:
- filename: Exported video filename
- creation_date: Date the reel was created
- original_uri: Path in original archive
- subtitles_uri: Path to subtitle in original archive
- subtitles_file: Renamed .srt subtitle file (if present)

Generated by your custom Instagram archive export tool.
""")

    # 3. Zip it all up
    with zipfile.ZipFile(zip_path, 'w') as zipf:
        zipf.write(csv_path, arcname=os.path.basename(csv_path))
        zipf.write(readme_path, arcname='README.txt')
        for file_path in copied_files:
            zipf.write(file_path, arcname=os.path.basename(file_path))

    print(f"\n✅ ZIP archive created: {zip_path}")


Preparing to export 25 Reels...
Metadata written to CSV: extracted_instagram_data/reels_export/reels_metadata.csv

✅ ZIP archive created: extracted_instagram_data/reels_export/reels_package.zip


In [None]:
# 2. Zip everything up
zip_path = '/content/reels_package.zip'
with zipfile.ZipFile(zip_path, 'w') as zipf:
    zipf.write(csv_path, arcname='reels_metadata.csv')
    for file_path in copied_files:
        zipf.write(file_path, arcname=os.path.basename(file_path))
    # Optional: include SRT files if you downloaded any

print(f"ZIP archive created: {zip_path}")

# Display download link
if os.path.exists(output_path):
  # 5. Trigger the download in your browser
  print(f"\nAttempting to download '{output_filename}'...")
  print("Check your browser's downloads!")
  files.download(output_path) # Use the full path or just the filename if in /content/
else:
  print(f"\nError: File '{output_path}' was not found. Cannot download.")

display(HTML(f'<a href="{output_path}" download>📦 Click here to download your Reels ZIP package</a>'))

ZIP archive created: /content/reels_package.zip
