# Scraping LinkedIn Comments using MultiOn API

In this recipe, we look at scraping LinkedIn comments from a post url. This example uses the API endpoints directly, rather than the SDK. 

## Prerequisites

- Python 3.6 or higher
- `requests` library
- MultiOn API key

## Step 1

1. Install the required library:

   ```
   pip install requests
   ```

2. Get your MultiOn API Key: https://app.multion.ai/api-keys

In [None]:
%pip install requests


## Step 2

Replace `your_multion_api_key_here` with your actual Multion API key:

   ```python
   API_KEY = "your_multion_api_key_here"
   ```

In [None]:

import requests
import json
import logging
import time

# Set up logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# TODO: Replace with your actual Multion API key
API_KEY = "your_multion_api_key_here"


## Step 3

Run the next 2 cells containing:
   - MultiOn API connection definitions
   - Use case-specific functions for LinkedIn comment scraping

In [None]:
## MULTION API FUNCTIONS

# Create a session
def create_session(url):
    endpoint = "https://api.multion.ai/v1/web/session/"
    headers = {
        "X_MULTION_API_KEY": API_KEY,
        "Content-Type": "application/json"
    }
    payload = {
        "url": url,
        "local": False, # Set to True if you want to run the agent locally
        "include_screenshot": True,  # Change this to True
        "optional_params": {
            "source": "playground"
        }
    }
    response = requests.post(endpoint, json=payload, headers=headers)
    return response.json()

# Retrieve data from the Multion API
def retrieve(url, cmd, session_id, fields=None, max_items=100, timeout=60):
    logging.info(f"Retrieving data from URL: {url}")
    endpoint = "https://api.multion.ai/v1/web/retrieve"
    headers = {
        "X_MULTION_API_KEY": API_KEY,
        "Content-Type": "application/json"
    }
    payload = {
        "url": url,
        "cmd": cmd,
        "session_id": session_id,
        "local": False, # Set to True if you want to run the agent locally
        "format": "json",
        "max_items": max_items,
        "full_page": True,
        "render_js": True,
        "scroll_to_bottom": True,
        "mode": "standard",
        "optional_params": {
            "source": "playground"
        }
    }
    if fields:
        payload["fields"] = fields

    logging.info("Sending request to Multion API")
    try:
        response = requests.post(endpoint, json=payload, headers=headers, timeout=timeout)
        logging.info(f"Received response with status code: {response.status_code}")
        response.raise_for_status()
        return response.json()
    except Exception as e:
        logging.error(f"An error occurred: {str(e)}")
        return {"status": "ERROR", "message": str(e)}

In [None]:
## SPECIFIC FUNCTIONS FOR THIS USE CASE

# Extract comments from a LinkedIn post
def extract_linkedin_comments(post_url):
    logging.info(f"Extracting comments from LinkedIn post: {post_url}")
    
    # Create a session
    session = create_session(post_url)
    if 'session_id' not in session:
        logging.error("Failed to create session")
        return []
    
    session_id = session['session_id']
    logging.info(f"Session created with ID: {session_id}")
    
    # Define the command to extract comments
    cmd = """
    Extract all comments from the LinkedIn post.
    For each comment, provide the following information:
    1. Commenter's name
    2. Commenter's headline (if available)
    3. Comment text
    4. Timestamp of the comment
    """
    
    fields = ["commenter_name", "commenter_headline", "comment_text", "timestamp"]
    
    # Use the session_id in the retrieve call
    response = perform_action_with_retry(lambda: retrieve(post_url, cmd, session_id, fields))
    
    if response.get('status') == 'DONE':
        logging.info(f"Successfully extracted {len(response.get('data', []))} comments")
        return response.get('data', [])
    else:
        logging.error(f"Failed to extract comments. Status: {response.get('status')}")
        return []

# Retry mechanism to handle errors
def perform_action_with_retry(action, max_retries=3, delay=5):
    for attempt in range(max_retries):
        result = action()
        if result.get('status') != 'ERROR':
            return result
        logging.warning(f"Attempt {attempt + 1} failed. Retrying in {delay} seconds...")
        time.sleep(delay)
    logging.error(f"Action failed after {max_retries} attempts.")
    return result



## Step 4

Set the LinkedIn post URL you want to scrape:

   ```python
   linkedin_post_url = "https://www.linkedin.com/feed/update/urn:li:activity:7238971183105241091/"
   ```
    
    Replace the URL with the actual LinkedIn post you want to scrape.

In [None]:
 # TODO: Replace with the actual LinkedIn post URL
linkedin_post_url = "https://www.linkedin.com/feed/update/urn:li:activity:7238971183105241091/"

## Step 5

Run the main execution cell, which will extract the comments from the LinkedIn post and print it

In [None]:
## MAIN EXECUTION 

if __name__ == "__main__":
   
    logging.info("Starting comment extraction process")
    comments = extract_linkedin_comments(linkedin_post_url)
    
    # Print the comments as JSON
    print(json.dumps(comments, ensure_ascii=False, indent=2))
    
    logging.info(f"Extracted {len(comments)} comments")
    logging.info("Comment extraction process completed")



## Output

The json will contain an array of comment objects, each with the following structure:

```
{
  "commenter_name": "John Doe",
  "commenter_headline": "Software Engineer",
  "comment_text": "Great post!",
  "timestamp": "2023-04-15T14:30:00Z"
}
```

## Customization

You can modify the `cmd` variable in the `extract_linkedin_comments()` function to change what data is extracted from each comment.

## Error Handling

The script includes a retry mechanism for API calls. If you encounter persistent errors, check your API key and internet connection.

## Note

Be respectful of LinkedIn's terms of service and do not use this script for excessive scraping or any prohibited activities.