# V2EX Post Analysis Notebook

This notebook processes V2EX post data by applying LLM-based analysis to each chunk of text within the JSON files. It adds the analysis results to each chunk and writes the modified data to new JSON files.

## Setup Google Drive

First, let's mount Google Drive to save our work there. This will help prevent data loss if the Colab session is interrupted.

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Create a directory for our project in Google Drive
!mkdir -p /content/drive/MyDrive/v2ex_analysis/posts_json_analyzed

## Setup Dependencies

Now, we'll install the required dependencies and set up the environment.

In [None]:
# Install required packages
!pip install torch transformers tqdm

## Download Files

Next, we'll download the analysis script and the JSON files from GitHub.

In [None]:
# Clone the repository to get the JSON files and script
!git clone https://github.com/baoliqi/v2ex-digest-pages.git

# Change to the repository directory
%cd v2ex-digest-pages

## Check Files

Let's check that we have the necessary files.

In [None]:
# Check that we have the analysis script
!ls -la analyze_posts.py

# Check that we have the JSON files
!ls -la docs/posts_json/ | head -n 10

# Check if we already have any analyzed files in Google Drive
!ls -la /content/drive/MyDrive/v2ex_analysis/posts_json_analyzed/ || echo "No files yet"

## Run Analysis

Now we'll run the analysis script to process the JSON files. We'll check for existing output files and skip them to avoid redundant processing.

In [None]:
# Run the analysis script with output to Google Drive
# Note: We don't use the --force flag so it will skip files that already exist
!python analyze_posts.py --output_dir /content/drive/MyDrive/v2ex_analysis/posts_json_analyzed

## Check Results

Let's check the results to make sure the analysis was successful.

In [None]:
# Check that we have the output files in Google Drive
!ls -la /content/drive/MyDrive/v2ex_analysis/posts_json_analyzed/ | head -n 10

## Download Results

Finally, let's download the results to your local machine.

In [None]:
# Create a zip file of the results from Google Drive
!zip -r posts_json_analyzed.zip /content/drive/MyDrive/v2ex_analysis/posts_json_analyzed/

# Download the zip file
from google.colab import files
files.download('posts_json_analyzed.zip')