# Scraping LinkedIn with MultiOn

In this recipe, we'll demonstrate how to scrape social network data from LinkedIn using MultiOn's Agent API in local mode. We'll be using `step` and `retrieve` to scrape all profiles relevant to MultiOn.

## Step 1: Set up the environment

First, let's install the required libraries and set up the MultiOn client.

In [None]:
%pip install multion

In [3]:
from multion.client import MultiOn

client = MultiOn(
    api_key="YOUR_API_KEY" # Get your API key from https://app.multion.ai/api-keys
)

## Step 2: Create the search agent

Next, we will create an agent session with local mode enabled, which allows us to see the agent in action in our local browser with the MultiOn Chrome Extension. This agent will be responsible for searching and navigating company search results.

Make sure that the MultiOn Chrome Extension is installed and enabled (for more details, see [here](https://docs.multion.ai/learn/browser-extension)).

In [None]:
create_response = client.sessions.create(
    url="https://linkedin.com",
    local=True
)

session_id = create_response.session_id

If you aren't logged in to LinkedIn yet, make sure you do so. Now, we can continue from the same session using the `session_id` from the response. We will search for people related to MultiOn and use `step` until we get to the people results page.

In [None]:
status = "CONTINUE"

while status == "CONTINUE":
    step_response = client.sessions.step(
        session_id=session_id,
        cmd="Search for MultiOn and see all people results."
    )
    status = step_response.status

## Step 3: Scrape search results

Once we are on the results page, we can start retrieving data with `retrieve`.

Since LinkedIn paginates its results, we will have to navigate to the next page with `step` once we have scraped all profiles on the current page. Conveniently, `retrieve` has an option to scroll down to the bottom while retrieving (`scroll_to_bottom`), which we will use to speed up the process. We can also enable `render_js` to get all image URLs.

In [None]:
scraped_profiles = []
has_more = True
page = 1

while has_more:
    retrieve_response = client.retrieve(
        session_id=session_id,
        cmd="Get all profiles with name, headline, location, current position, profile URL, and image URL.",
        fields=["name", "headline", "location", "current_position", "profile_url", "image_url"],
        scroll_to_bottom=True,
        render_js=True
    )
    scraped_profiles.extend(retrieve_response.data)
    print(f"Scraped page {page} with {len(retrieve_response.data)} profiles")
    page += 1
    step_response = client.sessions.step(
        session_id=session_id,
        cmd="Click the 'Next' button to go to the next page."
    )
    has_more = "last page" not in step_response.message

print(f"Scraped {len(scraped_profiles)} profiles:\n{scraped_profiles}")

## Step 4: Scrape profiles in parallel

If we want to get all the details from each profile, we will have to call `retrieve` on each individual profile page. To speed up the process, we can use `retrieve` in parallel for each URL we previously scraped from the search results. Note that we will be using remote mode by calling `retrieve` without `session_id` or `local`, since you probably don't want more than 10 browser windows open concurrently on your computer.

⚠️ We will be calling `retrieve` many times at once, beware of costs and rate limits!

In [None]:
from concurrent.futures import ThreadPoolExecutor

def fetch_profile_details(profile):
    profile_details_response = client.retrieve(
        url=profile["profile_url"],
        cmd="Get professional details with about, experience as array, education as array, skills as array, honors as array, and languages as array.",
        fields=["about", "experience", "education", "skills", "honors", "languages"]
    )
    profile["details"] = profile_details_response.data
    return profile

with ThreadPoolExecutor() as executor:
    scraped_profiles = list(executor.map(fetch_profile_details, scraped_profiles))

print(f"Scraped detailed profiles:\n{scraped_profiles}")